Retraining prompt injection classifiers for every new jailbreak is impossible | discoverkit

discoverkit · 2025-11-08T01:12:54.000Z

Our team is burning out retraining models every time a new jailbreak drops. We went from monthly retrains to weekly, now it's almost daily with all the creative bypasses hitting production. The eval pipeline alone takes 6 hours, then there's data labeling, hyperparameter tuning, and deployment testing. Anyone found a better approach? We've tried ensemble methods and rule-based fallbacks but coverage gaps keep appearing. Thinking about switching to more dynamic detection but worried about latency.

Retraining prompt injection classifiers for every new jailbreak is impossible | discoverkit | discoverkit