AI Model Releases Tracker — 2026-05-03
DeepInfra is now the cheapest inference provider tracked by LLM Stats, offering input pricing as low as $0.01 per million tokens with a 10 million token context window — a floor that undercuts every other provider in the platform's catalog of 2,961+ models across 43+ organizations. The May 3, 2026 snapshot also documents a measurable quality regression in GPT-5.4 and surfaces several open-weight models that have arrived quietly, with little fanfare, in the past two weeks.
The Claim
DeepInfra's entry into the tracked rankings comes with a range that spans from $0.01 to $1.79 per million input tokens, hosting 56 models including DeepSeek-V3.2 at $26 per million and DeepSeek-V4-Pro-Max at $174 per million. The 10 million token context window is a practical ceiling that sits far above what most production pipelines actually use, but it signals that the provider is positioning itself for long-document and multi-turn workloads where other budget options typically cap out early.
The April 23 release cluster documented in the snapshot is notable for its density: DeepSeek-V4-Flash-Max, DeepSeek-V4-Pro-Max, GPT-5.5, and GPT-5.5 Pro all dropped on the same calendar day. Whether that coordination was intentional or coincidental, the effect is that developers benchmarking in April were evaluating four major model releases simultaneously — a compression of the evaluation timeline that makes A/B comparisons harder to run cleanly.
On the Anthropic side, Claude Opus 4.7 is confirmed as of April 16, sitting alongside Sonnet 4.6 in the current catalog. The platform also lists a "Mythos Preview" tier priced at $2,500 per million tokens — a figure that appears to be targeting enterprise evaluation use cases rather than production throughput.
Why This Matters
The pricing strategy employed by DeepInfra could potentially disrupt the market for AI model inference. By offering such low-cost options, DeepInfra is not only making AI more accessible but also challenging existing pricing structures that have been in place for years. This shift could lead to a reevaluation of cost structures across the industry, potentially resulting in more competitive pricing and greater accessibility for smaller companies and developers who previously found cloud-based inference cost-prohibitive.
The introduction of a 10 million token context window is also significant. While most applications do not require such extensive context, the capability allows for more complex and nuanced AI applications, such as those requiring deep contextual understanding or extended conversational memory. This could open new avenues for applications in areas like legal document analysis, comprehensive customer service interactions, and more.
Developer and Practitioner Implications
Comparison to Similar Industry Developments
The AI model landscape is rapidly evolving, with several key players introducing innovations that affect both pricing and performance:
Practical Takeaways
Where It Falls Short
Two open-weight models in the snapshot have received almost no public benchmarking coverage despite having shipped weeks ago. Kimi K2.6 from Moonshot AI arrived April 20 as an open-source release, and Zhipu's GLM-5.1 and GLM-5V-Turbo round out a run of five models in six months from that organization — the latter adding multimodal vision capabilities. Neither has been put through the major English-language arenas in any systematic way, which means the quality data for both is thin. Developers evaluating either model are largely working from vendor claims and community spot-checks, not controlled head-to-head results.
The broader snapshot also surfaces Google holding the largest tracked context window at 2.1 million tokens and MiniMax M2.7 appearing on Fireworks at $30 per million — but neither entry comes with enough supporting benchmark data in the current snapshot to draw firm conclusions. The tracker captures release events and pricing with high fidelity; it is less useful as a quality oracle for newly listed models, and the gap between "tracked" and "evaluated" is widening as release velocity increases.
Grok-4.20 Beta at $200 per million sits at the opposite end of the cost curve from DeepInfra, and its beta status means the pricing is provisional. High-cost beta access has historically preceded either a significant price drop at general availability or a quiet discontinuation — neither outcome is yet predictable from the current data.
Source
llm-stats.com
Written by Hiram Clark, Editor — vybecoding.ai
Published on May 3, 2026