Monday, 18 May 2026
AI Daily
Front Page
Open SourceSunday, 03 May 2026 · 3 min read

Mistral Medium 3.5 Released as Open Weights: 128B Dense Model Tops 77.6% on SWE-Bench

Mistral AI released Mistral Medium 3.5 on May 2 as open weights under a modified MIT license — a 128B dense model with a 256k context window scoring 77.6% on SWE-Bench Verified, self-hostable on four GPUs, alongside remote agent support in its Vibe coding platform.

Mistral AI model release announcement representing open-weight model competition
Placeholder (picsum)

Mistral AI released Mistral Medium 3.5 on 2 May 2026 as open weights under a modified MIT license, marking the company's most capable openly available model and one of the strongest coding-focused open-weight releases yet from any European lab. The release arrived alongside a substantive update to Vibe, Mistral's coding platform, that adds asynchronous remote agent sessions — letting developers run long coding tasks in parallel cloud environments without blocking local workflows.

What Medium 3.5 Delivers

The model is a 128-billion-parameter dense architecture, not a mixture-of-experts design, with a 256,000-token context window. Weights are available on Hugging Face and can be self-hosted on as few as four consumer-grade GPUs, a practical threshold that opens the model to research labs, enterprise IT teams, and developers who cannot or will not route traffic through a third-party API.

On SWE-Bench Verified, the canonical evaluation for coding agents that solves real GitHub issues from open-source repositories, Medium 3.5 scores 77.6% — ahead of Devstral 2 and the much larger Qwen3.5 397B A17B. On τ³-Telecom, an agentic benchmark testing long-horizon task completion in telecom-domain settings, the model scores 91.4. These numbers position it directly against proprietary models charging significantly more per token, while the open-weight release allows operators to eliminate per-query API costs entirely once they have the hardware.

The model supports configurable reasoning depth — effort level can be adjusted per request to balance latency and thoroughness — and includes a custom-trained vision encoder capable of processing images at variable sizes and aspect ratios, enabling use cases that mix code and visual context.

API pricing for those who prefer hosted access is set at $1.50 per million input tokens and $7.50 per million output tokens, considerably cheaper than Anthropic's Opus 4.7 at $5/$25.

Vibe Remote Agents

The Vibe platform update is as significant commercially as the model release itself. Remote agents allow users to initiate a coding session from the Vibe CLI or from Mistral's Le Chat interface, "teleport" it to a cloud-hosted isolated sandbox, and let it continue running asynchronously while the user does something else. Multiple sessions can run in parallel. When the agent completes its task — which may involve writing code, running tests, and opening a pull request — the developer is notified and can review the result.

Tool integrations at launch include GitHub for pull request and branch management, Linear and Jira for issue tracking, Sentry for error context, and Slack and Microsoft Teams for notification. The isolated-sandbox model means each session has its own environment, preventing conflicts between parallel workstreams.

The architecture addresses a concrete problem in current AI-assisted development: frontier models are capable enough to complete non-trivial engineering tasks end-to-end, but the interaction pattern of existing tools requires humans to stay engaged as supervisors throughout. Remote agents shift the paradigm toward outcome-based delegation: assign a task, do other work, review the result.

Context in the Open-Source Model Landscape

Medium 3.5's release lands in a crowded week for open-weight coding models. DeepSeek V4 Pro, released on 24 April, posts an 81% SWE-bench score with a 1.6-trillion-parameter mixture-of-experts architecture at roughly $1.74 per million input tokens via API. The two models occupy different positions: DeepSeek V4 Pro is the most capable open model for raw benchmark performance; Medium 3.5 is optimised for practical self-hosting, with its four-GPU requirement putting it within reach of teams that cannot afford the multi-node infrastructure needed for DeepSeek V4 Pro at full scale.

For Mistral specifically, the release reinforces the company's stated strategy of building open-weight models that European enterprises can deploy on their own infrastructure — satisfying data-sovereignty requirements without routing traffic through US cloud services. Medium 3.5 joins a lineup that includes Mistral 7B and Mixtral at the smaller end, and serves as the middle tier below the company's proprietary frontier offerings.

The modified MIT license attached to the weights deserves attention. The word "modified" typically indicates restrictions on commercial use or redistribution that go beyond the standard MIT terms, and developers intending to integrate Medium 3.5 into commercial products should review the full licence text before deployment.

#mistral#open-weights#coding#swe-bench#mit-license#dense-model

Sources

More from Open Source

See all