Gemma 4
- Provider
- Status
- available
- Context
- 256,000 tok
Gemma 4 is Google’s open-weight model family, released on 2 April 2026 under the permissive Apache 2.0 licence. It is the open counterpart to the proprietary Gemini line: where Gemini 3.1 Pro is API-only, Gemma 4 weights can be downloaded, run locally, fine-tuned and used commercially with no royalties or usage caps. Google bills it as “byte for byte, the most capable open models” (Google).
The family spans five sizes — from the tiny E2B (~2.3B effective parameters) for edge devices up to a 31B dense model that bridges local and server-grade use — including a 26B mixture-of-experts variant that activates only ~4B parameters per token. All share a 256K-token context window, are multimodal (text and image input, with audio on the smaller models), and support 140+ languages (Google AI for Developers).
Quick specs
| Provider | Google (DeepMind) |
| Released | 2 April 2026 |
| Licence | Apache 2.0 (open weights) |
| Sizes | E2B, E4B, 12B, 26B MoE (A4B), 31B dense |
| Context window | 256,000 tokens |
| Modalities | Text + image input (audio on smaller sizes); text output |
| Languages | 140+ |
| Price | Free to self-host; low-cost via third-party hosts |
| Best for | Private/on-prem and edge deployment, fine-tuning, multilingual apps |
| Limitations | Below the proprietary frontier; not a reasoning model |
What Gemma 4 is
Gemma 4 is Google DeepMind’s open-weight family — the open sibling to Gemini, built from related research but released for anyone to run. The Apache 2.0 licence allows unlimited commercial use, modification, fine-tuning and redistribution with no royalties, no monthly-active-user limits and no restrictive use policy (Google AI for Developers), which makes it a natural choice for private, on-prem or air-gapped deployment and for building products without sending data to a hosted API.
The defining feature is the size ladder, so you can match the model to your hardware:
| Size | Type | Notes |
|---|---|---|
| E2B | ~2.3B effective | Smallest; on-device / edge |
| E4B | ~4.5B effective | Edge and mobile |
| 12B | Dense | Mid-size |
| 26B (A4B) | Mixture-of-experts | 26B total, ~4B active per token |
| 31B | Dense | Flagship; local-to-server bridge |
Capabilities
Gemma 4 models are multimodal — text and image input with text output, and audio input on the E2B, E4B and 12B sizes — with a 256K-token context window and multilingual coverage across 140+ languages (Google AI for Developers). They support function calling and fine-tuning, and run across the open ecosystem (Hugging Face, Ollama, LM Studio) as well as Google AI Studio and Vertex AI.
Performance
Google positions Gemma 4 as leading open-weight models at comparable sizes — “byte for byte, the most capable open models” (Google). In practice that means strong capability per parameter rather than frontier capability: Gemma 4 is built to punch above its size class on accessible hardware, not to match the proprietary frontier (Gemini 3.5 Pro, Claude Opus 4.8, GPT-5.5) or the largest open models like DeepSeek V4. Exact benchmark figures vary by size and quantisation, so treat published numbers as host-dependent. See best AI models for where the open-weight field stands.
Pricing and access
Gemma 4 is free to download and self-host under Apache 2.0; there is no first-party per-token price. The weights are published on Google AI for Developers and Hugging Face, and the models run locally (the smallest on phones and laptops, the 31B on a single server GPU) or via third-party hosts, plus Google AI Studio and Vertex AI.
How Gemma 4 compares
- vs Google’s Gemini line — Gemma 4 trades frontier capability for openness and control; for the best quality use Gemini 3.1 Pro or 3.5 Flash, for self-hosting use Gemma 4.
- vs other open models — Against Meta’s Llama, Alibaba’s Qwen, Mistral and DeepSeek, Gemma 4 competes on capability-per-parameter, multimodality and its range of accessible sizes; the largest open models lead on raw capability.
Known limitations
Below the proprietary frontier. Gemma 4 is built for efficiency at accessible sizes, not to match Gemini’s or rivals’ flagships. Not a dedicated reasoning model. It lacks the extended “thinking” modes of the Gemini Pro/Deep Think line. Smaller context than Gemini. 256K tokens is large but well below Gemini’s 1M–2M windows.
FAQ
What is Gemma 4?
Gemma 4 is Google DeepMind’s open-weight model family, released April 2026 under the Apache 2.0 licence. It comes in five sizes from edge-friendly E2B to a 31B dense model, all with a 256K context window and multimodal input, and can be downloaded and used commercially for free.
Is Gemma 4 free and open source?
Yes. It is released under Apache 2.0 — free to download, run, fine-tune and use commercially, with no royalties or usage limits. The weights are on Google AI for Developers and Hugging Face.
How big are the Gemma 4 models?
Five sizes: E2B (~2.3B effective), E4B (~4.5B), 12B dense, a 26B mixture-of-experts (≈4B active per token), and a 31B dense flagship. Smaller sizes run on phones and laptops; the 31B runs on a single server GPU.
Is Gemma 4 as good as Gemini?
No — Gemma 4 is built for strong performance per parameter on accessible hardware, not to match the proprietary Gemini flagships. For top capability use Gemini 3.1 Pro; for self-hosting and fine-tuning, Gemma 4 is the open choice.
Last verified 18 June 2026. Specifications are from Google’s Gemma 4 announcement and developer documentation; exact benchmark figures vary by model size, host and quantisation. Confirm against Google’s official pages before relying on specific numbers.