Our team is burning out retraining models every time a new jailbreak drops. We went from monthly retrains to weekly, now it's almost daily with all the creative bypasses hitting production. The eval pipeline alone takes 6 hours, then there's data labeling, hyperparameter tuning, and deployment testing. Anyone found a better approach? We've tried ensemble methods and rule-based fallbacks but coverage gaps keep appearing. Thinking about switching to more dynamic detection but worried about latency.