OWASP GENAI SELF-BENCHMARK — V104 · MARCH 2026 · 50.0M PROMPTS TESTED

3 Frameworks.
30 Risks. Full Coverage.

The DESTILL NI-Stack self-benchmarked against every OWASP GenAI security framework — LLM Top 10, Agentic Top 10, and the AI Testing Guide. Every claim backed by live, verifiable evidence via our open Red Team API.

LLM Top 10

5 EXCEEDS · 4 Covered · 1 Partial

Agentic Top 10

6 EXCEEDS · 4 Covered

AI Testing Guide

9 Areas · All with Live Evidence

OWASP Top 10 for LLM Applications — 2025

The Foundation: LLM Security Risks

The 10 most critical security vulnerabilities in LLM applications. We cover all 10 — and exceed in 5.

LLM01:2025

Prompt Injection

User prompts alter LLM behavior by overriding system instructions.

✅ AEGIS 115-agent cascade · 99.30% GTO TPR on 50.0M prompts (19 datasets incl. Pliny, WildJailbreak) · TWAIN Shield V104 · 0.46ms avg latency · 928 learned immune patterns

⚡ Exceeds

LLM02:2025

Sensitive Information Disclosure

LLMs revealing PII, API keys, or proprietary data in responses.

✅ POAW cryptographic receipts · Output sanitization layers · Self-hosted — no data leaves your infrastructure

✓ Covered

LLM03:2025

Supply Chain

Vulnerabilities in third-party models, plugins, and training pipelines.

✅ 100% self-hosted · No third-party model dependencies · Sovereign infrastructure · EU data residency

⚡ Exceeds

LLM04:2025

Data and Model Poisoning

Tampering with training or fine-tuning data to alter model behavior.

✅ QFAI-C Quantum-Merkle integrity hashes · Labeled corpus audit (March 2026) · Data lineage validation

✓ Covered

LLM05:2025

Improper Output Handling

Insufficient validation and sanitization of LLM outputs.

✅ SIREN alignment monitor · Output validation layers in cascade · Thermal coherence bounds

✓ Covered

LLM06:2025

Excessive Agency

LLMs granted too much autonomy or access to tools and systems.

✅ AEGIS Intention Gate (FEAT-192) · POAW-Sealed Tool Manifests · Agent Identity Isolation · Claims 721-744

⚡ Exceeds

LLM07:2025

System Prompt Leakage

Attackers extracting confidential system prompts from LLMs.

✅ Anti-Extraction Shield (4-layer) · Entropy budget enforcement · Red Team API verified

⚡ Exceeds

LLM08:2025

Vector and Embedding Weaknesses

Exploiting vulnerabilities in RAG embeddings and vector stores.

✅ 12D Heim-dimensional projection (not standard embeddings) · Quantum-seeded entropy · Novel architecture

✓ Covered

LLM09:2025

Misinformation

LLMs generating false or misleading information (hallucinations).

⚠️ SIREN thermal coherence monitoring detects drift · AEGIS filters harmful content · Output-side validation

⚠ Partial — Input-side focus

LLM10:2025

Unbounded Consumption

Resource exhaustion through denial-of-service or excessive usage.

✅ CPU-only architecture · 0.46ms avg · 4,945 prompts/sec · No GPU = no GPU DoS · Rate limiting + Cost Amplification

⚡ Exceeds

OWASP Top 10 for Agentic Applications — 2026 (NEW)

The Frontier: Agentic AI Risks

Released December 2025 by 100+ industry experts. The benchmark for autonomous AI agent security. We cover all 10 — and exceed in 6.

ASI01

Agent Goal Hijack

Attackers manipulating an agent's objectives through malicious input.

✅ AEGIS Intent Classification Scanner · FEAT-192 Intention Gate · 115-agent semantic analysis prevents goal drift

⚡ Exceeds

ASI02

Tool Misuse and Exploitation

Agents using tools in unsafe ways or being tricked into exploiting them.

✅ POAW-Sealed Tool Manifests · MCP Security Gateway · 24 patent claims on tool isolation

⚡ Exceeds

ASI03

Identity and Privilege Abuse

Agents with excessive permissions leading to privilege escalation.

✅ Self-Sovereign Identity (SSI) · KERI/EUDI integration roadmap · Post-quantum signed credentials

✓ Covered

ASI04

Agentic Supply Chain Vulnerabilities

Compromised external models, plugins, or datasets in the ecosystem.

✅ Self-hosted sovereign architecture · No external model dependencies · AIBOM-compatible inventory

✓ Covered

ASI05

Unexpected Code Execution (RCE)

Agents executing unintended code through vulnerabilities.

✅ Self-Sanitizing MCP Response Proxy (FEAT-192) · No eval() paths · Sandboxed execution

⚡ Exceeds

ASI06

Memory & Context Poisoning

Altering an agent's memory to cause incorrect actions.

✅ Quantum-Merkle Accumulator · Context integrity sealing · Immutable decision chain

⚡ Exceeds

ASI07

Insecure Inter-Agent Communication

Message tampering or role spoofing between agents.

✅ Agent Identity Isolation (FEAT-192) · PQC encryption (ML-KEM/ML-DSA) · POAW-signed messages

⚡ Exceeds

ASI08

Cascading Failures

A flaw triggering chain reactions across interconnected agents.

✅ 115-agent cascade with phase isolation · SIREN feedback loop prevents propagation · Fail-safe defaults

⚡ Exceeds

ASI09

Human-Agent Trust Exploitation

Manipulating trust between humans and agents for malicious outcomes.

✅ POAW cryptographic receipts · Full Nachvollziehbarkeit (audit trail) · Transparent per-layer decisions

✓ Covered

ASI10

Rogue Agents

Agents operating outside intended parameters, performing unauthorized actions.

✅ AEGIS behavioral monitoring · SIREN alignment bounds (Golden Ratio asymptote) · Automatic containment

✓ Covered

OWASP AI Testing Guide — Released November 2025

The Methodology: Trustworthiness Testing

Standardized, repeatable test cases across 4 layers. Every test area has live evidence from our V104 Mega-Benchmark (50.0M lifetime prompts).

Layer	Test Area	NI-Stack Evidence	Status
AI Application	Prompt Injection Testing	V104: 50.0M prompts, 19 datasets, 99.30% GTO TPR · TWAIN Shield	✅ Proven
AI Application	Output Handling Validation	SIREN alignment + output sanitization layers	✅ Proven
AI Model	Adversarial Input Testing	Pliny 100% PERFECT · Chaos Mode V5 mutations	✅ Proven
AI Model	Model Stealing Prevention	Anti-Extraction Shield · 4-layer obfuscation	✅ Proven
AI Model	Backdoor Testing	115-agent cascade detects hidden triggers	✅ Proven
AI Infrastructure	Privacy Validation	Self-hosted · PQC encryption · EU data residency	✅ Proven
AI Infrastructure	Rate Limiting / DoS	CPU-only · 4,945 prompts/sec · Built-in throttling + Cost Amplification	✅ Proven
AI Data	Data Poisoning Checks	QFAI-C Merkle integrity · Labeled corpus audit	✅ Proven
AI Data	Data Lineage Validation	Full Nachvollziehbarkeit chain · POAW receipts	✅ Proven

Framework Coverage Comparison

Who Covers All Three?

Only one vendor demonstrates compliance against all three OWASP GenAI frameworks simultaneously.

OWASP Framework	DESTILL NI-Stack	Lakera Guard	OpenAI Moderation	NeMo Guardrails
LLM Top 10 (2025)	10/10 ✅	~6/10	~4/10	~5/10
Agentic Top 10 (2026)	10/10 ✅	~2/10	~1/10	~3/10
AI Testing Guide (2025)	9/9 ✅	~3/9	~2/9	~4/9
LLMSVS Level	Level 3 (Highest)	Level 1	N/A	Level 1
Live Verification API	✅ Open Red Team API	✗ Closed	✗ Closed	✗ N/A
MCP Security Coverage	✅ 24 claims filed	✗	✗	✗
Cryptographic Audit	✅ POAW + PQC	✗	✗	✗

Additional OWASP Alignment

Beyond the Top 10s

📋

LLMSVS Level 3

LLM Security Verification Standard — targeting the highest assurance level across all 8 control domains.

🔌

MCP Security Guide

Our FEAT-192 MCP Security Gateway anticipates all 4 blind spots identified in OWASP's March 2026 guide.

🔬

Red Team Evaluation Criteria

Our open Red Team API meets or exceeds the OWASP Vendor Evaluation Criteria v1.0 for AI Red Teaming.

📦

AIBOM Compatible

AI Bill of Materials generation for the NI-Stack's 115-agent cascade — full supply chain transparency.

3 Frameworks. 30 Risks. Full Coverage.

The Foundation: LLM Security Risks

The Frontier: Agentic AI Risks

The Methodology: Trustworthiness Testing

Who Covers All Three?

Beyond the Top 10s

LLMSVS Level 3

MCP Security Guide

Red Team Evaluation Criteria

AIBOM Compatible

Don't trust our scores. Verify.

3 Frameworks.
30 Risks. Full Coverage.