3 min read
Gemini 2.5 vs GPT‑4o
**Last verified:** 19 July 2025 ---  to GPT‑4o. A couple of numbers have shifted, mostly pricing, so here is a clear, single‑page reference you can drop into the article or link as a footnote.
2 · Latest official sources
Model / Doc | Link | Doc timestamp |
---|---|---|
Gemini 2.5 Pro – Vertex AI model card | docs.google.com (Model Card) | 11 Jul 2025 |
Gemini 2.5 Flash – Vertex AI model card | docs.google.com (Model Card) | 10 Jul 2025 |
Gemini thinking‑budget guide | Google API docs | 14 Jun 2025 |
Gemini pricing table | AI Studio pricing | 09 Jul 2025 |
OpenAI GPT‑4o pricing | openai.com/pricing | 05 Jun 2025 |
3 · Pricing changes since the original post
Tier | Old (May 2025) | Current (19 Jul 2025) | Δ |
---|---|---|---|
Gemini 2.5 Flash input | $0.15 / M tokens | $0.10 / M | ↓ 33 % |
Gemini 2.5 Flash output | $0.40 / M | $0.40 / M | — |
Gemini 2.5 Pro input (≤ 200 k) | $1.25 / M | $1.25 / M | — |
Gemini 2.5 Pro output (≤ 200 k) | $10 / M | $10 / M | — |
Context‑cache price | — | $0.31 / M (≤ 200 k) | new |
Flash‑Lite tier | — | $0.10 / M in · $0.40 / M out | new |
4 · Capabilities & context windows (unchanged)
Feature | Gemini 2.5 Pro | Gemini 2.5 Flash | GPT‑4o |
---|---|---|---|
Max context | 1 M tokens | 1 M tokens | 128 k tokens |
Thinking‑budget knob | ✓ | — | — |
Vision + audio | Good (slower) | Vision only (fast) | Real‑time, best‑in‑class |
No change since the original post.
5 · Benchmarks & latency snapshots
- Gemini 2.5 Pro still scores ≈ 85 ± 1 % MMLU, narrowly behind GPT‑4o (≈ 86 %).
- Independent latency tests report 0.26–0.32 s TTFT for Flash, well within the “sub‑second” claim.
- GPT‑4o latency remains industry‑leading for multimodal I/O.
6 · What actually changed?
- Flash input cost cut from $0.15 → $0.10 per M tokens.
- Flash‑Lite SKU introduced (same pricing as new Flash input, slightly lower weight limit).
- Context‑cache billing line added to Google docs.
- Minor wording: Google now labels Pro the “most advanced reasoning Gemini model.”
Everything else, context limits, thinking tokens, latency, benchmark scores, GPT‑4o pricing, remains unchanged.
7 · Bottom Line:
Only three edits needed to keep your comparison current:
- Update the Flash input price to $0.10 / M.
- Add a footnote that Flash‑Lite is now available (same price, lower latency).
- Mention the context‑cache add‑on if your readers care about serving cost at scale.
👨💻
Ryan Katayi
Full-stack developer who turns coffee into code. Building things that make the web a better place, one commit at a time.
more about me→