THE AI RANKINGS

Google

Gemma 4

Provider
Google
Status
available
Context
256,000 tok

Gemma 4 is Google’s open-weight model family, released on 2 April 2026 under the permissive Apache 2.0 licence. It is the open counterpart to the proprietary Gemini line: where Gemini 3.1 Pro is API-only, Gemma 4 weights can be downloaded, run locally, fine-tuned and used commercially with no royalties or usage caps. Google bills it as “byte for byte, the most capable open models” (Google).

The family spans five sizes — from the tiny E2B (~2.3B effective parameters) for edge devices up to a 31B dense model that bridges local and server-grade use — including a 26B mixture-of-experts variant that activates only ~4B parameters per token. All share a 256K-token context window, are multimodal (text and image input, with audio on the smaller models), and support 140+ languages (Google AI for Developers).

Quick specs

ProviderGoogle (DeepMind)
Released2 April 2026
LicenceApache 2.0 (open weights)
SizesE2B, E4B, 12B, 26B MoE (A4B), 31B dense
Context window256,000 tokens
ModalitiesText + image input (audio on smaller sizes); text output
Languages140+
PriceFree to self-host; low-cost via third-party hosts
Best forPrivate/on-prem and edge deployment, fine-tuning, multilingual apps
LimitationsBelow the proprietary frontier; not a reasoning model

GET GEMMA 4 →

What Gemma 4 is

Gemma 4 is Google DeepMind’s open-weight family — the open sibling to Gemini, built from related research but released for anyone to run. The Apache 2.0 licence allows unlimited commercial use, modification, fine-tuning and redistribution with no royalties, no monthly-active-user limits and no restrictive use policy (Google AI for Developers), which makes it a natural choice for private, on-prem or air-gapped deployment and for building products without sending data to a hosted API.

The defining feature is the size ladder, so you can match the model to your hardware:

SizeTypeNotes
E2B~2.3B effectiveSmallest; on-device / edge
E4B~4.5B effectiveEdge and mobile
12BDenseMid-size
26B (A4B)Mixture-of-experts26B total, ~4B active per token
31BDenseFlagship; local-to-server bridge

Capabilities

Gemma 4 models are multimodal — text and image input with text output, and audio input on the E2B, E4B and 12B sizes — with a 256K-token context window and multilingual coverage across 140+ languages (Google AI for Developers). They support function calling and fine-tuning, and run across the open ecosystem (Hugging Face, Ollama, LM Studio) as well as Google AI Studio and Vertex AI.

Performance

Google positions Gemma 4 as leading open-weight models at comparable sizes — “byte for byte, the most capable open models” (Google). In practice that means strong capability per parameter rather than frontier capability: Gemma 4 is built to punch above its size class on accessible hardware, not to match the proprietary frontier (Gemini 3.5 Pro, Claude Opus 4.8, GPT-5.5) or the largest open models like DeepSeek V4. Exact benchmark figures vary by size and quantisation, so treat published numbers as host-dependent. See best AI models for where the open-weight field stands.

Pricing and access

Gemma 4 is free to download and self-host under Apache 2.0; there is no first-party per-token price. The weights are published on Google AI for Developers and Hugging Face, and the models run locally (the smallest on phones and laptops, the 31B on a single server GPU) or via third-party hosts, plus Google AI Studio and Vertex AI.

How Gemma 4 compares

Known limitations

Below the proprietary frontier. Gemma 4 is built for efficiency at accessible sizes, not to match Gemini’s or rivals’ flagships. Not a dedicated reasoning model. It lacks the extended “thinking” modes of the Gemini Pro/Deep Think line. Smaller context than Gemini. 256K tokens is large but well below Gemini’s 1M–2M windows.

FAQ

What is Gemma 4?

Gemma 4 is Google DeepMind’s open-weight model family, released April 2026 under the Apache 2.0 licence. It comes in five sizes from edge-friendly E2B to a 31B dense model, all with a 256K context window and multimodal input, and can be downloaded and used commercially for free.

Is Gemma 4 free and open source?

Yes. It is released under Apache 2.0 — free to download, run, fine-tune and use commercially, with no royalties or usage limits. The weights are on Google AI for Developers and Hugging Face.

How big are the Gemma 4 models?

Five sizes: E2B (~2.3B effective), E4B (~4.5B), 12B dense, a 26B mixture-of-experts (≈4B active per token), and a 31B dense flagship. Smaller sizes run on phones and laptops; the 31B runs on a single server GPU.

Is Gemma 4 as good as Gemini?

No — Gemma 4 is built for strong performance per parameter on accessible hardware, not to match the proprietary Gemini flagships. For top capability use Gemini 3.1 Pro; for self-hosting and fine-tuning, Gemma 4 is the open choice.


Last verified 18 June 2026. Specifications are from Google’s Gemma 4 announcement and developer documentation; exact benchmark figures vary by model size, host and quantisation. Confirm against Google’s official pages before relying on specific numbers.