Lossless Output & Input Compression
Compresses output without losing a single word.
An RL agent that learns each LLM's verbosity patterns and creates a lossless shorthand dictionary. 30–60% output compression. Zero quality loss. Bi-directional. OpenAI has prompt caching, speculative decoding, and KV quantization — but nothing that compresses the output itself.
50-90% savings on cached input tokens
2-3× latency reduction via draft models
INT8/FP8 memory compression inside model
Activate fewer parameters per token
Virtual memory for KV cache
Lossless output compression via learned verbosity dictionaries
STENO observes each LLM's output patterns and builds a contraction dictionary — like shorthand for AI. The dictionary grows smarter with every request.
The RL agent monitors LLM output across millions of requests, identifying repeated phrases, boilerplate, and verbosity patterns unique to each model.
A contraction dictionary is built per model using Fibonacci-positioned token anchors. "In conclusion, it is important to note that" → single token.
Output tokens are replaced with dictionary contractions in real-time. 30-60% fewer tokens billed. Zero quality degradation — fully lossless.
Client-side expansion restores the full output. Bi-directional: input prompts are also compressed before sending to the LLM.
"In conclusion, it is important to note that the implementation of quantum-safe cryptographic algorithms requires careful consideration of several key factors. First and foremost, organizations should evaluate their current cryptographic infrastructure..."
"⌠quantum-safe crypto requires: 1) eval current infra..."
Client expands to full text. User sees original quality.Every compression/expansion cycle produces a BLAKE3 hash verification. If even one character changes, the integrity check fails. Mathematically proven lossless.
Compression ratios, dictionary versions, and expansion receipts are logged via POAW hash-chain. Full EU AI Act Art. 14 transparency.
Dictionaries are built and stored on sovereign EU infrastructure (Hetzner). No customer prompt data leaves the EU. Zero US subprocessors.
More customers → better dictionaries → better compression → more customers. Competitors cannot replicate without our customer base and patent portfolio.
| Metric | Without STENO | With STENO | Impact |
|---|---|---|---|
| Output tokens billed | 100% (full verbosity) | 40-70% (compressed) | 30-60% reduction |
| Input tokens billed | 100% (full prompts) | 70-85% (bi-directional) | 15-30% reduction |
| Quality degradation | N/A | 0% (lossless) | Zero trade-off |
| Integration effort | N/A | 1 line (base_url change) | Zero engineering |
| At 1M req/month (500 tok avg) | $22,500/mo | $15,750/mo | $6,750/mo saved |
STENO activates automatically when you route through api.destill.ai/v1.