Gemma 4

Provider: Google
Status: available
Context: 256,000 tok

Gemma 4 is Google’s open-weight model family, released on 2 April 2026 under the permissive Apache 2.0 licence. It is the open counterpart to the proprietary Gemini line: where Gemini 3.1 Pro is API-only, Gemma 4 weights can be downloaded, run locally, fine-tuned and used commercially with no royalties or usage caps. Google bills it as “byte for byte, the most capable open models” (Google).

The family spans five sizes — from the tiny E2B (~2.3B effective parameters) for edge devices up to a 31B dense model that bridges local and server-grade use — including a 26B mixture-of-experts variant that activates only ~4B parameters per token. All share a 256K-token context window, are multimodal (text and image input, with audio on the smaller models), and support 140+ languages (Google AI for Developers).

Quick specs


Provider	Google (DeepMind)
Released	2 April 2026
Licence	Apache 2.0 (open weights)
Sizes	E2B, E4B, 12B, 26B MoE (A4B), 31B dense
Context window	256,000 tokens
Modalities	Text + image input (audio on smaller sizes); text output
Languages	140+
Price	Free to self-host; low-cost via third-party hosts
Best for	Private/on-prem and edge deployment, fine-tuning, multilingual apps
Limitations	Below the proprietary frontier; not a reasoning model

GET GEMMA 4 →

What Gemma 4 is

Gemma 4 is Google DeepMind’s open-weight family — the open sibling to Gemini, built from related research but released for anyone to run. The Apache 2.0 licence allows unlimited commercial use, modification, fine-tuning and redistribution with no royalties, no monthly-active-user limits and no restrictive use policy (Google AI for Developers), which makes it a natural choice for private, on-prem or air-gapped deployment and for building products without sending data to a hosted API.

The defining feature is the size ladder, so you can match the model to your hardware:

Size	Type	Notes
E2B	~2.3B effective	Smallest; on-device / edge
E4B	~4.5B effective	Edge and mobile
12B	Dense	Mid-size
26B (A4B)	Mixture-of-experts	26B total, ~4B active per token
31B	Dense	Flagship; local-to-server bridge

Capabilities

Gemma 4 models are multimodal — text and image input with text output, and audio input on the E2B, E4B and 12B sizes — with a 256K-token context window and multilingual coverage across 140+ languages (Google AI for Developers). They support function calling and fine-tuning, and run across the open ecosystem (Hugging Face, Ollama, LM Studio) as well as Google AI Studio and Vertex AI.

Performance

Google positions Gemma 4 as leading open-weight models at comparable sizes — “byte for byte, the most capable open models” (Google). In practice that means strong capability per parameter rather than frontier capability: Gemma 4 is built to punch above its size class on accessible hardware, not to match the proprietary frontier (Gemini 3.5 Pro, Claude Opus 4.8, GPT-5.5) or the largest open models like DeepSeek V4. Exact benchmark figures vary by size and quantisation, so treat published numbers as host-dependent. See best AI models for where the open-weight field stands.

Pricing and access

Gemma 4 is free to download and self-host under Apache 2.0; there is no first-party per-token price. The weights are published on Google AI for Developers and Hugging Face, and the models run locally (the smallest on phones and laptops, the 31B on a single server GPU) or via third-party hosts, plus Google AI Studio and Vertex AI.

How Gemma 4 compares

vs Google’s Gemini line — Gemma 4 trades frontier capability for openness and control; for the best quality use Gemini 3.1 Pro or 3.5 Flash, for self-hosting use Gemma 4.
vs other open models — Against Meta’s Llama, Alibaba’s Qwen, Mistral and DeepSeek, Gemma 4 competes on capability-per-parameter, multimodality and its range of accessible sizes; the largest open models lead on raw capability.

Known limitations

Below the proprietary frontier. Gemma 4 is built for efficiency at accessible sizes, not to match Gemini’s or rivals’ flagships. Not a dedicated reasoning model. It lacks the extended “thinking” modes of the Gemini Pro/Deep Think line. Smaller context than Gemini. 256K tokens is large but well below Gemini’s 1M–2M windows.

FAQ

What is Gemma 4?

Gemma 4 is Google DeepMind’s open-weight model family, released April 2026 under the Apache 2.0 licence. It comes in five sizes from edge-friendly E2B to a 31B dense model, all with a 256K context window and multimodal input, and can be downloaded and used commercially for free.

Is Gemma 4 free and open source?

Yes. It is released under Apache 2.0 — free to download, run, fine-tune and use commercially, with no royalties or usage limits. The weights are on Google AI for Developers and Hugging Face.

How big are the Gemma 4 models?

Five sizes: E2B (~2.3B effective), E4B (~4.5B), 12B dense, a 26B mixture-of-experts (≈4B active per token), and a 31B dense flagship. Smaller sizes run on phones and laptops; the 31B runs on a single server GPU.

Is Gemma 4 as good as Gemini?

No — Gemma 4 is built for strong performance per parameter on accessible hardware, not to match the proprietary Gemini flagships. For top capability use Gemini 3.1 Pro; for self-hosting and fine-tuning, Gemma 4 is the open choice.

Last verified 18 June 2026. Specifications are from Google’s Gemma 4 announcement and developer documentation; exact benchmark figures vary by model size, host and quantisation. Confirm against Google’s official pages before relying on specific numbers.