Subquadratic, a Miami startup of 13 people that left stealth in early May 2026, is asking developers to take a set of numbers on faith — because so far it is the only party that has measured them. Its model, SubQ, pairs a new attention design called SSA with a context window of 12 million tokens, and the company says that combination needs roughly 1,000 times less attention compute than a standard transformer would at that length — about 50 average novels in a single prompt. The figures are striking. They are also almost entirely self-reported.
This article separates what is well-corroborated across multiple outlets, what is a single-source company claim, and what remains an open question no outside lab has answered yet.
What SSA actually changes
A standard transformer uses "dense" attention: every token compares itself against every other token. That is the famous quadratic cost — double the context, quadruple the work. It is the reason most frontier models cap context around 128K tokens, with 1M as the current high-water mark.
SSA — which Subquadratic and The New Stack expand as "Subquadratic Selective Attention," though some coverage (36Kr, DataCamp) calls it "Sparse Attention" — instead picks the positions in the sequence that matter for the current token and ignores the rest. Subquadratic describes it as skipping roughly 99% of attention interactions through content-based selection, so cost grows close to linearly with context length rather than quadratically.
The efficiency figures the company reports are internally consistent and have been repeated across several writeups:
Note the distinction the headline number hides: the ~1,000x figure is attention FLOP reduction at 12M tokens, while the ~52x figure is wall-clock speedup at 1M tokens. They describe different metrics at different scales, and conflating them is the easiest way to overstate the result.
The benchmarks — and the gap nobody has explained
Subquadratic shipped three products at launch: a 12M-token API, SubQ Code (a command-line agent that loads an entire repository in one pass for what the company pitches as "whole-artifact reasoning" — planning across every file at once instead of over a few retrieved snippets), and, per some reports, a deep-research tool called SubQ Search. A third-party testing service reportedly confirmed several benchmark runs, but no independent research group has reproduced the architecture from scratch.
On the published numbers, the picture is mixed rather than dominant:
At the full 12M-token length, Subquadratic claims "over 90%" on needle-in-a-haystack retrieval — a scale at which no competing frontier model has even been benchmarked, which means there is currently nothing to compare it against.
Cost, funding, and the things sources disagree on
The cost claim is the loudest: on the RULER 128K test, Subquadratic says a full run costs about $8 on SubQ versus roughly $2,600 on Opus — a ~300x gap, or "about 5% of Opus." That is a company-reported figure with no published per-token pricing behind it, and SubQ remains in private beta behind a waitlist, so independent buyers cannot yet check it.
Even the basic corporate facts vary by outlet, which is itself a useful signal about how early this is:
Why developers should hold the applause for one cycle
There is a direct precedent worth remembering. In 2024, Magic.dev raised around $500M on claims of a 100-million-token context window aimed at coding, and despite the funding it saw limited real-world adoption. Big context numbers have outrun shipped, verified utility before.
If SSA holds up under independent reproduction, the second-order effect is the interesting one: a model that can ingest a whole codebase or document corpus in one pass weakens the case for some retrieval pipelines that exist mainly to work around short context windows. But "weakens the case for some" is not "RAG is dead." Retrieval still does jobs long context does not — cross-session memory, access control, auditability, and live index updates among them — and that argument deserves its own treatment rather than a victory lap.
For now, the honest summary is narrow and specific: a small team has published an attention design that, on its own three benchmarks, trades a little coding accuracy for a large efficiency and context-length win, with one unexplained lab-versus-production gap and zero outside reproduction. That is genuinely worth watching. It is not yet worth rewriting your architecture around.
Sources

Written by Hiram Clark, Editor — vybecoding.ai
Published on June 20, 2026