THE AI RANKINGS

Mistral AI

Mistral Large 3

Provider
Mistral AI
Status
available
Context
256,000 tok
Price
$0.5 / $1.5 /MTok

Mistral Large 3 is Mistral AI’s open-weight flagship, announced on 2 December 2025 as the headline of the “Mistral 3” family. It marks the lab’s return to a sparse mixture-of-experts architecture at the frontier — 41 billion active and 675 billion total parameters — released under the permissive Apache 2.0 licence, so the weights are free to download, self-host, fine-tune and use commercially (TechCrunch).

Mistral positions it as “one of the best permissive open-weight models in the world”: at launch it debuted around 1418 Elo on LMArena, #2 among open-source non-reasoning models and #6 among open-source models overall (Mistral). It sits in the open-weight cluster with DeepSeek V4, MiniMax M3 and Alibaba’s Qwen — strong, cheap and self-hostable — while the closed frontier leaders (Claude Opus 4.8, GPT-5.5, Gemini 3.x) still lead on the very hardest reasoning and agentic tasks.

Quick specs

ProviderMistral AI
Released2 December 2025
StatusAvailable (open weights)
ArchitectureSparse mixture-of-experts — 41B active / 675B total
Context window256,000 tokens
ModalitiesText + image in, text out; 40+ languages
LicenceApache 2.0 (open weights, self-hostable)
Input price$0.50 / MTok
Output price$1.50 / MTok
LMArena~1418 Elo (#2 OSS non-reasoning)
Best forLow-cost / self-hosted open-weight generalist, agentic and multilingual work
LimitationsNot a dedicated reasoning model; below the closed frontier on the hardest tasks

VIEW MISTRAL LARGE 3 →

What Mistral Large 3 is

Mistral Large 3 is a sparse mixture-of-experts model — a return to the MoE design Mistral pioneered with Mixtral, after the dense Mistral Large 2 generation. Mistral’s headline figures are 41B active and 675B total parameters (Mistral); the Hugging Face model card breaks this down as a 673B-parameter language model with 39B active, plus a 2.5B vision encoder. Mistral says it was trained from scratch on roughly 3,000 Nvidia H200 GPUs and optimised for Nvidia’s stack (Nvidia).

It is multimodal — text and image understanding in, text out — and multilingual across 40+ languages, with a 256,000-token context window (Artificial Analysis). Mistral released base, instruction-tuned and NVFP4-quantized builds, and notes that a dedicated reasoning version is “coming soon” — Large 3 itself is a strong generalist rather than a reasoning model.

The point of the release is open-weight capability at the frontier: a permissively licensed model enterprises and governments can run on their own infrastructure, which is the heart of Mistral’s sovereignty pitch.

Benchmark performance

Mistral led its launch with arena placement and presented most results as charts rather than a numeric table, so independent leaderboards carry much of the weight here.

BenchmarkMistral Large 3Notes
LMArena (Elo)~1418#2 OSS non-reasoning, #6 OSS overall; top OSS coding model on the arena at launch (Mistral)
GPQA Diamond67.17 (vendor)Contested — independent write-ups cite ~44%; treat as vendor-reported (HF card)
Artificial Analysis Intelligence Index16Independent; below-average for its size (Artificial Analysis)
SWE-bench VerifiedNot publishedMistral described coding as “comparable to other high-capacity dense and MoE systems” without a headline figure (DataCamp)

The pattern is a capable open-weight generalist that trades blows with other open models (DeepSeek, Qwen) but trails the proprietary leaders on the hardest reasoning and agentic work. Note the GPQA Diamond conflict — Mistral’s own model card reports 67.17 while independent testers report roughly 44%, a gap likely down to differing prompting and evaluation setups; we present both, attributed. See best AI models for cross-model standings.

Pricing and access

Mistral Large 3 is priced at $0.50 input / $1.50 output per million tokens on Mistral’s La Plateforme, with the same rates on OpenRouter (Artificial Analysis) — roughly 80% cheaper than the GPT-4o generation it was compared against (DataCamp). The API model ID is mistral-large-2512 (version 25.12).

Because it is Apache 2.0 open weights, it is also free to download and self-host from Hugging Face — via vLLM or Nvidia NIM, with an NVFP4-quantized build for lower-memory deployment. Beyond Mistral’s own API it shipped at launch on Amazon Bedrock, Azure AI Foundry and IBM watsonx.

How Mistral Large 3 compares

Known limitations

Not a dedicated reasoning model — Mistral’s own card states purpose-built reasoning models outperform it on strict reasoning, and a reasoning variant was still forthcoming at launch. Below the closed frontier on the hardest tasks. Independent intelligence ratings are mid-pack — Artificial Analysis rates it below-average for its size and somewhat expensive against open-weight peers. Some benchmark numbers are vendor-reported or charted rather than tabulated — notably the disputed GPQA Diamond figure; prefer standardized leaderboards where a decision rides on the number.

FAQ

What is Mistral Large 3?

Mistral Large 3 is Mistral AI’s open-weight flagship, announced 2 December 2025 — a sparse mixture-of-experts model with 41B active and 675B total parameters, a 256K-token context window and multimodal (text + image) input, released under the permissive Apache 2.0 licence.

Is Mistral Large 3 open source?

Yes — it is released as open weights under Apache 2.0, free to download, self-host, fine-tune and use commercially. The weights are on Hugging Face, including an NVFP4-quantized build.

How much does Mistral Large 3 cost?

$0.50 per million input tokens and $1.50 per million output tokens on Mistral’s API, with the same rates on OpenRouter. The open weights are free to self-host.

Is Mistral Large 3 a reasoning model?

No. Mistral describes it as a strong generalist rather than a dedicated reasoning model, and said a reasoning version was “coming soon” at launch. For the hardest reasoning tasks, purpose-built reasoning models outperform it.

How good is Mistral Large 3?

It is one of the strongest permissive open-weight models — debuting around 1418 Elo on LMArena (#2 OSS non-reasoning) — but independent indices place it mid-pack for its size, and the closed frontier leaders still lead on the hardest reasoning and agentic tasks.


Last verified 19 June 2026. Architecture, licence, pricing and the LMArena placement are confirmed by Mistral and corroborated by Hugging Face, Artificial Analysis and OpenRouter. GPQA Diamond is shown with both the vendor (67.17) and independent (~44%) figures because they conflict; benchmark numbers should be confirmed against current leaderboards before relying on them.