Login

Sentinel: SOTA model to protect against prompt injections

See all blogs
Dror Ivry
9/6/2025
Table of content

Introduction

Prompt injection remains one of the most stubborn attack vectors against Large Language Models (LLM’s) and LLM powered agents. Sentinel - released as open weights model, qualifire/prompt-injection-sentinel, is a novel detection model based on the answerdotai/ModernBERT-large architecture with a rigorously curated and synthesized, diverse training set. On a comprehensive internal test set, Sentinel attains 0.980 F1-score, outperforming previous strong baselines by wide margins while maintaining ~20ms inference latency on a single NVIDIA L4 GPU.

Why Prompt Injection Persists

Prompt injection attacks, malicious prompts that hijack a large language model’s intended behaviour, range from simple directives like”ignore previous instructions” to sophisticated, jailbreak-style exploits using role playing, obfuscation, character escaping, or context withholding.  They constantly evolve, using template-driven patterns, generative techniques, and automated optimization to keep producing new variants. 

Effective LLM security hinges on prompt-injection detection: binary classifiers fine tuned on balanced datasets of benign and hostile inputs (such as protectai/deberta-v3-base-prompt-injection), further hardened through adversarial training and smart input preprocessing. Techniques such as LLM-as-a-judge are unsuitable in this context, because the “judge” model is just as vulnerable to prompt-injection as the system it is meant to evaluate. 

As adversarial tactics keep evolving, sustaining jailbreak resilience and safety across LLM vendors demands two pillars: vendor agnostic, dynamic benchmarks that track new attack patterns and diverse, continuously refreshed training sets that inoculate models against them.

The Sentinel Detection Model

Model Architecture - leveraging ModenBERT

Sentinel is built on the backbone of answerdotai/ModernBERT-large. Rotary positional embeddings (RoPE) maintain long-range fidelity, while Local-Global Attention and Flash Attention keep inference fast and memory-efficient. That lets Sentinel scan full, multi-page prompts for injection signals without truncation—perfect for real-time LLM security workflows.

Results

Due to its 395M parameters, Sentinel’s architectural optimizations yield ~20 ms latency on modest L4 hardware, enabling inline evaluation for interactive LLM applications without noticeable overhead. 

In the chart below you can see that Sentinel posts 0.987 average accuracy—13.9 pp above the baseline—and an F1 of 0.980 versus 0.728, these results attributed to both ModernBERT architecture and the extensive and diverse training dataset.

Performance on the Internal Held-Out Test Set

Across four rigorous public prompt-injection benchmarks, Sentinel outshines the DeBERTa-v3 detector on every dataset, posting an average F1 of 0.938—nearly 23 points ahead of the baseline’s 0.709—clear proof of its superior robustness and generalization.

Benchmark Performance (Binary F1 Score)

Conclusion

Sentinel combines the power of ModernBERT with a rigorously curated, ever-evolving dataset to deliver real-time prompt injection protection at sub-20 ms latency on cost-effective hardware, ensuring your AI systems behave exactly as you intended. By continuously updating its benchmarks and remaining vendor-agnostic, Sentinel adapts to emerging jailbreak techniques without sacrificing performance or adding complexity to your deployment. Ready to secure your LLMs with state-of-the-art evaluations, safeguards, and controls.

Read the full paper here

Back to the blog

Subscribe for fresh news straight to your inbox

Stay across the latest developments and news in this
rapidly-changing sector.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Read Next

Guides
LLM Evaluation Frameworks, Metrics & Methods Explained
Discover top LLM evaluation frameworks, metrics, and benchmarks like G-Eval and LLM-as-a-Judge to build reliable generative AI systems.
Dror Ivry
4/6/2025
Research
When AI Gets It Wrong
Welcome to “The Hallucination Horror Files,” a collection of real-world AI catastrophes illustrating how these failures occurred and crucially, how proactive real-time oversight could have prevented them.
Dror Ivry
4/6/2025
Research
3 min read
Unleashing the Beast: The Hidden Risks of Releasing Your AI Application
AI isn't without risks – discover the dangers of deployment and how to mitigate them in this crucial read.
Dror Ivry
4/6/2025