OWASP GENAI SELF-BENCHMARK — V57 · MARCH 2026 · 16.15M PROMPTS TESTED

3 Frameworks.
30 Risks. Full Coverage.

The DESTILL NI-Stack self-benchmarked against every OWASP GenAI security frameworkLLM Top 10, Agentic Top 10, and the AI Testing Guide. Every claim backed by live, verifiable evidence via our open Red Team API.

10 /10
LLM Top 10
5 EXCEEDS · 4 Covered · 1 Partial
10 /10
Agentic Top 10
6 EXCEEDS · 4 Covered
9 /9
AI Testing Guide
9 Areas · All with Live Evidence
OWASP Top 10 for LLM Applications — 2025

The Foundation: LLM Security Risks

The 10 most critical security vulnerabilities in LLM applications. We cover all 10 — and exceed in 5.

LLM01:2025
Prompt Injection
User prompts alter LLM behavior by overriding system instructions.
✅ AEGIS 42-layer cascade · 86.16% TPR on 16.15M prompts (19 datasets incl. Pliny, WildJailbreak) · TWAIN Shield V57 · 0.46ms avg latency · 928 learned immune patterns
⚡ Exceeds
LLM02:2025
Sensitive Information Disclosure
LLMs revealing PII, API keys, or proprietary data in responses.
✅ POAW cryptographic receipts · Output sanitization layers · Self-hosted — no data leaves your infrastructure
✓ Covered
LLM03:2025
Supply Chain
Vulnerabilities in third-party models, plugins, and training pipelines.
✅ 100% self-hosted · No third-party model dependencies · Sovereign infrastructure · EU data residency
⚡ Exceeds
LLM04:2025
Data and Model Poisoning
Tampering with training or fine-tuning data to alter model behavior.
✅ QFAI-C Quantum-Merkle integrity hashes · Labeled corpus audit (March 2026) · Data lineage validation
✓ Covered
LLM05:2025
Improper Output Handling
Insufficient validation and sanitization of LLM outputs.
✅ SIREN alignment monitor · Output validation layers in cascade · Thermal coherence bounds
✓ Covered
LLM06:2025
Excessive Agency
LLMs granted too much autonomy or access to tools and systems.
✅ AEGIS Intention Gate (FEAT-192) · POAW-Sealed Tool Manifests · Agent Identity Isolation · Claims 721-744
⚡ Exceeds
LLM07:2025
System Prompt Leakage
Attackers extracting confidential system prompts from LLMs.
✅ Anti-Extraction Shield (4-layer) · Entropy budget enforcement · Red Team API verified
⚡ Exceeds
LLM08:2025
Vector and Embedding Weaknesses
Exploiting vulnerabilities in RAG embeddings and vector stores.
✅ 12D Heim-dimensional projection (not standard embeddings) · Quantum-seeded entropy · Novel architecture
✓ Covered
LLM09:2025
Misinformation
LLMs generating false or misleading information (hallucinations).
⚠️ SIREN thermal coherence monitoring detects drift · AEGIS filters harmful content · Output-side validation
⚠ Partial — Input-side focus
LLM10:2025
Unbounded Consumption
Resource exhaustion through denial-of-service or excessive usage.
✅ CPU-only architecture · 0.46ms avg · 2,162 prompts/sec · No GPU = no GPU DoS · Rate limiting + Cost Amplification
⚡ Exceeds
OWASP Top 10 for Agentic Applications — 2026 (NEW)

The Frontier: Agentic AI Risks

Released December 2025 by 100+ industry experts. The benchmark for autonomous AI agent security. We cover all 10 — and exceed in 6.

ASI01
Agent Goal Hijack
Attackers manipulating an agent's objectives through malicious input.
✅ AEGIS Intent Classification Scanner · FEAT-192 Intention Gate · 42-layer semantic analysis prevents goal drift
⚡ Exceeds
ASI02
Tool Misuse and Exploitation
Agents using tools in unsafe ways or being tricked into exploiting them.
✅ POAW-Sealed Tool Manifests · MCP Security Gateway · 24 patent claims on tool isolation
⚡ Exceeds
ASI03
Identity and Privilege Abuse
Agents with excessive permissions leading to privilege escalation.
✅ Self-Sovereign Identity (SSI) · KERI/EUDI integration roadmap · Post-quantum signed credentials
✓ Covered
ASI04
Agentic Supply Chain Vulnerabilities
Compromised external models, plugins, or datasets in the ecosystem.
✅ Self-hosted sovereign architecture · No external model dependencies · AIBOM-compatible inventory
✓ Covered
ASI05
Unexpected Code Execution (RCE)
Agents executing unintended code through vulnerabilities.
✅ Self-Sanitizing MCP Response Proxy (FEAT-192) · No eval() paths · Sandboxed execution
⚡ Exceeds
ASI06
Memory & Context Poisoning
Altering an agent's memory to cause incorrect actions.
✅ Quantum-Merkle Accumulator · Context integrity sealing · Immutable decision chain
⚡ Exceeds
ASI07
Insecure Inter-Agent Communication
Message tampering or role spoofing between agents.
✅ Agent Identity Isolation (FEAT-192) · PQC encryption (ML-KEM/ML-DSA) · POAW-signed messages
⚡ Exceeds
ASI08
Cascading Failures
A flaw triggering chain reactions across interconnected agents.
✅ 42-layer cascade with phase isolation · SIREN feedback loop prevents propagation · Fail-safe defaults
⚡ Exceeds
ASI09
Human-Agent Trust Exploitation
Manipulating trust between humans and agents for malicious outcomes.
✅ POAW cryptographic receipts · Full Nachvollziehbarkeit (audit trail) · Transparent per-layer decisions
✓ Covered
ASI10
Rogue Agents
Agents operating outside intended parameters, performing unauthorized actions.
✅ AEGIS behavioral monitoring · SIREN alignment bounds (Golden Ratio asymptote) · Automatic containment
✓ Covered
OWASP AI Testing Guide — Released November 2025

The Methodology: Trustworthiness Testing

Standardized, repeatable test cases across 4 layers. Every test area has live evidence from our V57 Mega-Benchmark (16.15M lifetime prompts).

Layer Test Area NI-Stack Evidence Status
AI Application Prompt Injection Testing V57: 16.15M prompts, 19 datasets, 86.16% TPR (harder corpus) · TWAIN Shield ✅ Proven
AI Application Output Handling Validation SIREN alignment + output sanitization layers ✅ Proven
AI Model Adversarial Input Testing Pliny 100% PERFECT · Chaos Mode V5 mutations ✅ Proven
AI Model Model Stealing Prevention Anti-Extraction Shield · 4-layer obfuscation ✅ Proven
AI Model Backdoor Testing 42-layer cascade detects hidden triggers ✅ Proven
AI Infrastructure Privacy Validation Self-hosted · PQC encryption · EU data residency ✅ Proven
AI Infrastructure Rate Limiting / DoS CPU-only · 2,162 prompts/sec · Built-in throttling + Cost Amplification ✅ Proven
AI Data Data Poisoning Checks QFAI-C Merkle integrity · Labeled corpus audit ✅ Proven
AI Data Data Lineage Validation Full Nachvollziehbarkeit chain · POAW receipts ✅ Proven
Framework Coverage Comparison

Who Covers All Three?

Only one vendor demonstrates compliance against all three OWASP GenAI frameworks simultaneously.

OWASP Framework DESTILL NI-Stack Lakera Guard OpenAI Moderation NeMo Guardrails
LLM Top 10 (2025) 10/10 ✅ ~6/10 ~4/10 ~5/10
Agentic Top 10 (2026) 10/10 ✅ ~2/10 ~1/10 ~3/10
AI Testing Guide (2025) 9/9 ✅ ~3/9 ~2/9 ~4/9
LLMSVS Level Level 3 (Highest) Level 1 N/A Level 1
Live Verification API ✅ Open Red Team API ✗ Closed ✗ Closed ✗ N/A
MCP Security Coverage ✅ 24 claims filed
Cryptographic Audit ✅ POAW + PQC
Additional OWASP Alignment

Beyond the Top 10s

📋

LLMSVS Level 3

LLM Security Verification Standard — targeting the highest assurance level across all 8 control domains.

🔌

MCP Security Guide

Our FEAT-192 MCP Security Gateway anticipates all 4 blind spots identified in OWASP's March 2026 guide.

🔬

Red Team Evaluation Criteria

Our open Red Team API meets or exceeds the OWASP Vendor Evaluation Criteria v1.0 for AI Red Teaming.

📦

AIBOM Compatible

AI Bill of Materials generation for the NI-Stack's 42-layer cascade — full supply chain transparency.

Don't trust our scores. Verify.

Every OWASP risk mapping is backed by live evidence. Test with your own prompts against the live cascade.

📊 Open Live Dashboard ← Back to DESTILL Home

Sources: OWASP GenAI Security Project · LLM Top 10 · Agentic Security Initiative