Kimi Work — 2026-06-12

You can now drag a bug-reproduction video into a coding agent and have it reason over the footage and the code together. That, not the 300-agent local swarm, is the feature in Moonshot's Kimi Work with no equivalent anywhere else, and the ACP protocol bet underneath it may matter more than either.

Drag a screen recording of a bug into a coding task and let the agent watch it. That feature, which has no direct equivalent in any shipping coding tool today, is the part of Moonshot AI's Kimi Work that should make competitors uncomfortable. The headline spec is louder: up to 300 sub-agents running in parallel, all locally on your own machine, all coordinated by the company's Kimi K2.6 model. Both shipped June 12, 2026 in a native desktop app, one of the first times large-scale parallel agent execution has run outside a cloud API on consumer hardware. But the number that will date fastest is the 300. The video input is the one that points somewhere new.

What Changed

For most of the past year, running multi-agent workflows meant either paying for cloud infrastructure or accepting that your "agent swarm" was really two or three workers stitched together by a thin orchestrator. Kimi Work changes the ceiling. The app supports up to 300 sub-agents running in parallel — all coordinated locally, all under the same session. That's not a theoretical maximum tucked into a configuration file; Moonshot is presenting it as the product's headline number.

The model running the show is Kimi K2.6, Moonshot's current flagship. The Artificial Analysis Intelligence Index rates K2.6 at 54, which places it as the highest-scoring Chinese open model on that benchmark at launch. Moonshot hasn't published a separate model card or independent evaluation alongside the Kimi Work announcement, so K2.6's exact parameter count and training details remain its own disclosure — but the Intelligence Index score puts it in a tier that competes meaningfully with Western frontier models, not just as a regional alternative.

Kimi Work ships as a native desktop application. By default, data stays on the local machine; there's no telemetry pipeline routing your codebase or files to a remote server unless you explicitly enable the web bridge, which is an optional add-on for tasks that require live data from the internet. That local-first default is a deliberate architectural choice, and it addresses one of the most consistent concerns developers raise about cloud-based agents: you can't route a client's proprietary code through a third-party API without a conversation about data governance.

Persistent memory rounds out the core feature set. Agents retain context across sessions, so a project you handed off to Kimi Work on Monday is still in scope on Thursday without manually re-explaining the repository structure or the task history. Persistent memory isn't new in isolation — several agent frameworks offer it — but combined with a 300-agent execution ceiling and a local runtime, it changes the kind of work that becomes tractable in a single app.

How It Works

The 300-agent ceiling is only meaningful if the agents can coordinate without turning into a traffic jam. Kimi Work implements ACP — the Agent Communication Protocol — as its inter-agent messaging layer. ACP is an emerging open standard for how agents exchange messages with one another, playing roughly the same role for agent-to-agent communication that the Model Context Protocol plays for agent-to-tool communication. Moonshot adopting ACP positions Kimi Work inside a broader ecosystem: if other agent runtimes converge on the same protocol, a Kimi Work swarm could theoretically exchange messages with agents running in entirely different environments.

The orchestration model follows a hub-and-spoke pattern with K2.6 at the center. The primary agent receives a task, breaks it into subtasks, and dispatches sub-agents to handle them in parallel. Because each sub-agent is a separate process rather than a thread inside a shared model call, they can operate on different files, different APIs, or different parts of a codebase simultaneously without blocking each other. The web bridge, when enabled, lets sub-agents pull live data — documentation pages, API responses, current prices — without requiring the developer to pre-load that content as context.

The video-as-coding-context feature works at the intersection of Kimi K2.6's multimodal capabilities and the local runtime. A developer can drag a screen recording, a UI walkthrough, or a bug-reproduction video directly into a coding task. The agent processes individual frames alongside the code, meaning it can correlate a visual bug with the function that produced it, or follow a recorded workflow and generate corresponding test cases. Most current coding agents accept text, structured data, or image screenshots — video input at this level of integration is different in kind, not just degree.

What It Means for Developers

The practical implication of a 300-agent local swarm is that tasks which previously required either a long sequential chain or an expensive cloud orchestration layer now fit inside a desktop app. A large-scale refactor, a multi-repository audit, or a test suite generation job that touches hundreds of files can run in parallel on your own hardware. For developers on projects with strict data residency requirements — healthcare, legal, fintech — the local-first default isn't just a nice-to-have; it's the difference between being able to use an AI agent at all and being prohibited from it by compliance requirements.

The ACP adoption is the detail I'd watch most closely in the months after this launch. MCP became the de facto standard for tool connectivity in AI agents faster than most observers expected — within six months of its introduction, nearly every major agent framework had wired it in. If ACP achieves even a fraction of that adoption velocity, developers who build agent pipelines now face a choice about which inter-agent message format to standardize on. Kimi Work's launch gives ACP a high-profile production implementation to point to, which meaningfully increases the probability that other vendors follow. My read: this protocol bet is the most strategically significant decision in the Kimi Work architecture, more so than the 300-agent number. The agent count is impressive. It is also just a product spec, and product specs get matched.

The video-as-context feature is narrower in its immediate application but genuinely novel. Developers who do UI work, build mobile apps, or debug visual regressions have a new class of input they can hand to an agent. Rather than describing a bug in text and hoping the agent reconstructs it accurately, you record the screen, drop the file in, and let the agent correlate the frames with the relevant code paths. It's early. Frame-to-code accuracy will vary by task. But the workflow it opens simply didn't exist in any comparable tool before today.

Source

marktechpost.com

Written by the vybecoding.ai editorial team

Published on June 12, 2026

Kimi Work — 2026-06-12

What Changed

How It Works

What It Means for Developers

Source

TOPICS