Llama 4

Q: Is Llama 4 free and open source?

It is open weights and free to download, but under the Llama 4 Community License rather than a fully-permissive Apache/MIT licence — there are usage obligations and a 700-million-monthly-active-user threshold that requires a separate licence from Meta.

Provider: Meta
Status: available
Context: 10,000,000 tok

Llama 4 is Meta’s open-weight model family, released on 5 April 2025 — the first open-weight, natively-multimodal models built on a mixture-of-experts (MoE) architecture, and the high-water mark of Meta’s open-source era. The “herd” came in three sizes: Scout (17B active / 109B total, with an industry-leading 10-million-token context window), Maverick (17B active / 400B total, the assistant workhorse), and Behemoth (288B active / ~2T total, a teacher model that remained in training) (Meta, Hugging Face).

In 2026 Llama 4 is legacy for Meta’s own assistant — Muse Spark replaced it as the Meta AI engine — but it remains freely available for developers to download, run and fine-tune. Its licence is not Apache 2.0: the Llama 4 Community License carries usage obligations and a 700-million-monthly-active-user threshold that requires a separate licence from Meta (Royfactory).

Quick specs


Provider	Meta
Released	5 April 2025
Status	Available (open weights; legacy as the Meta AI engine)
Architecture	Mixture-of-experts, natively multimodal
Sizes	Scout (109B), Maverick (400B), Behemoth (~2T, in training)
Context window	Up to 10,000,000 tokens (Scout)
Licence	Llama 4 Community License (700M-MAU clause)
Modalities	Text + image input; text output
Best for	Self-hosting, fine-tuning, very long context (Scout)
Limitations	Superseded by Muse Spark; non-Apache licence; open frontier has moved ahead

GET LLAMA 4 →

The Llama 4 herd

Model	Active / total params	Context	Notes
Scout	17B / 109B (16 experts)	10M tokens	Fits a single server GPU with 4/8-bit quantisation
Maverick	17B / 400B (128 experts)	1M tokens	The assistant workhorse; BF16 and FP8
Behemoth	288B / ~2T (16 experts)	—	Teacher model; remained in training

All use a mixture-of-experts design — one expert is always active for general knowledge, with others selected per token — which is what lets Scout deliver a 10M-token context on accessible hardware (Meta). Scout was pretrained on ~40 trillion tokens and Maverick on ~22 trillion, of multimodal data.

Benchmark performance

At its April 2025 release, Meta reported that Llama 4 Maverick beat GPT-4o and Gemini 2.0 Flash across a range of widely-reported benchmarks and matched DeepSeek v3 on reasoning and coding at less than half the active parameters, while Scout outperformed Gemma 3, Gemini 2.0 Flash-Lite and Mistral 3.1 (Meta, TechTalks).

A credibility caveat: Meta’s then-Chief AI Scientist Yann LeCun later alleged that the GenAI team had “fudged” some Llama 4 benchmark results — an unverified claim, but one that contributed to Zuckerberg’s loss of confidence in the team and the subsequent reorganisation (see the Meta provider page). Either way, the open frontier has since moved well ahead — models like DeepSeek V4 now lead on open-weight capability — so Llama 4’s draw today is its licence-able weights and Scout’s enormous context, not raw benchmark standing. See best AI models.

Licence and access

Llama 4 is open weights, but not Apache 2.0. The Llama 4 Community License allows free download, self-hosting, fine-tuning and commercial use, but with practical obligations and a 700-million-monthly-active-user threshold above which a separate licence from Meta is required (Royfactory) — a meaningful distinction from the fully-permissive licences on gpt-oss, Gemma 4 or DeepSeek.

The weights are on Hugging Face and llama.com, with hosting across the major clouds and inference providers. Scout fits on a single server-grade GPU; Maverick ships in BF16 and FP8.

How Llama 4 compares

vs Muse Spark — Muse Spark replaced Llama 4 as the Meta AI engine and matches Maverick at far less compute, but it is proprietary; Llama 4 is the open option.
vs other open models — Against DeepSeek, Alibaba’s Qwen, Mistral and Gemma 4, Llama 4’s standout is Scout’s 10M-token context; on raw capability the newer open models now lead, and Llama 4’s licence is more restrictive than Apache/MIT alternatives.

Known limitations

Superseded as Meta’s assistant by Muse Spark. Restrictive licence — the 700M-MAU clause makes it less “open” than Apache/MIT rivals. Benchmark-credibility cloud from the LeCun allegation. And the open frontier has moved ahead since April 2025, so Llama 4 trails the current best open-weight models on capability.

FAQ

What is Llama 4?

Llama 4 is Meta’s open-weight model family, released April 2025 — the first open-weight, natively-multimodal mixture-of-experts models. It comes in Scout (10M context), Maverick (the workhorse) and the in-training Behemoth.

Is Llama 4 free and open source?

It is open weights and free to download, but under the Llama 4 Community License rather than a fully-permissive Apache/MIT licence — there are usage obligations and a 700-million-monthly-active-user threshold that requires a separate licence from Meta.

What is Llama 4 Scout’s context window?

Up to 10 million tokens — the largest of any widely-available model at its release — while fitting on a single server-grade GPU with quantisation.

Is Llama 4 still Meta’s main model?

No. Muse Spark replaced Llama 4 as the engine behind Meta AI in April 2026. Llama 4 remains available as open weights for developers.

Last verified 18 June 2026. Llama 4 figures are Meta-reported from the April 2025 release; the “fudged benchmarks” allegation is attributed to Yann LeCun and is unverified. Licence terms and benchmarks change — confirm against Meta’s official pages and the model licence before relying on them.