
AI League — Game Day 4: OpenAI Goes Full Sprint, Google Jumps a Gear
GPT-5.5 surges +16.5% in speed. Gemini 3.1 Pro jumps +25% to 143 t/s. Claude holds #1 (AI Index 61). Qwen3.7 Max enters at 57 / 189 t/s. Full June 1 stats. #AILeague

June 1, 2026 · Week 1 of June · Post-Game Stats Panel
GPT-5.5 posts a +16.5% speed surge. Gemini 3.1 Pro storms up 25 t/s in a single session. Claude holds the throne but its legs look a little heavier. Welcome to the first Monday of June. #AILeague
Final standings — June 1
| Rank | Team / Model | AI Index | Δ | Speed (t/s) | Blend $/M |
|---|---|---|---|---|---|
| 🥇 1 | Anthropic — Claude Opus 4.8 (Max) | 61 | ↔ | 55.6 | $4.10 |
| 🥈 2 | OpenAI — GPT-5.5 (xhigh) | 60 | ↔ | 60.7 ↑ | $4.35 |
| 🥉 3 | OpenAI — GPT-5.5 (high) | 59 | ↔ | — | $4.35 |
| 4 | Anthropic — Claude Opus 4.7 (Max) | 57 | ↔ | — | $4.10 |
| 4 | Google — Gemini 3.1 Pro Preview | 57 | ↔ | 143.0 ↑↑ | $1.74 |
| 4 | OpenAI — GPT-5.5 (medium) | 57 | ↔ | — | $4.35 |
| 7 | Google — Gemini 3.5 Flash (high) | 55 | ↔ | 175.7 | $1.31 |
| 8 | Kimi K2.6 | 54 | ↔ | 50 | $0.70 |
| 8 | MiMo-V2.5-Pro (Xiaomi) | 54 | ↔ | 51 | $0.18 |
| 10 | xAI — Grok 4.3 (high) | 53 | ↔ | 142.9 | $0.64 |
| 11 | DeepSeek V4 Pro (Max) | 52 | ↔ | 47.3 | $0.18 |
| 12 | Meta — Llama 4 Scout | 14 | ↔ | 107.4 | $0.22 |
Intelligence Index scores from Artificial Analysis (72h rolling) 1
Game of the night: OpenAI's speed run
GPT-5.5 (xhigh) went from 52.1 t/s on May 31 to 60.7 t/s today — a +16.5% single-session leap, the biggest speed jump in the core-6 bracket this week 2.
That's worth context: OpenAI's flagship still trails on intelligence (60 vs. Claude's 61) and carries the second-steepest price tag in the field ($4.35/M blended, $30.00/M output). But if you're running throughput-sensitive workloads — code review pipelines, bulk classification, anything that scales linearly with token speed — today's 60.7 t/s printout is a real data point.
The traditional powerhouse is running faster. Whether that's infrastructure headroom finally showing up or a load-balancing artifact won't be clear until tomorrow's reading.
Speed panel

| Model | Speed (t/s) | vs. May 31 | TTFT (s) |
|---|---|---|---|
| Google — Gemini 3.5 Flash | 175.7 | 183.2 → 175.7 (↓4.1%) | 19.3 |
| Google — Gemini 3.1 Pro | 143.0 | 114.2 → 143.0 (+25.2%) ✦ | 20.2 |
| xAI — Grok 4.3 | 142.9 | 145.2 → 142.9 (↓1.6%) | 13.6 |
| Meta — Llama 4 Scout | 107.4 | 104.2 → 107.4 (+3.1%) | 0.85 |
| Anthropic — Claude Opus 4.8 | 55.6 | 58.7 → 55.6 (↓5.3%) | 9.3 |
| OpenAI — GPT-5.5 (xhigh) | 60.7 | 52.1 → 60.7 (+16.5%) ✦ | 68.3 |
| DeepSeek V4 Pro | 47.3 | 47.5 → 47.3 (↔) | 1.8 |
✦ Session high-water marks today. Speed data from Artificial Analysis live measurements 1
Gemini 3.1 Pro's +25.2% jump is even larger in absolute terms than GPT-5.5's. The state-owned club's Pro variant went from 114 to 143 t/s — that's a model reading 29 extra tokens every second compared to yesterday. At 143 t/s, Gemini 3.1 Pro now matches Grok 4.3 (high) on throughput while sitting four places higher on the intelligence table. That's an unusual combo: fast and smart at $1.74/M.
Claude, meanwhile, dropped from 58.7 to 55.6 t/s — its third consecutive day under 60 t/s. Still the #1 intelligence squad. But if speed is the stat you run, the safety-pressing Anthropic side isn't the one putting up field-goal numbers right now.
Pricing war: the value bracket is getting crowded

The cheapest-smart corner of the bracket now has three residents:
| Model | AI Index | Blend $/M | Notes |
|---|---|---|---|
| DeepSeek V4 Pro (Max) | 52 | $0.18 | Permanent rate since promo expired May 31 |
| MiMo-V2.5-Pro (Xiaomi) | 54 | $0.18 | Matches DeepSeek on price, leads on index |
| xAI — Grok 4.3 (high) | 53 | $0.64 | 3.5× DeepSeek/MiMo, but 143 t/s speed |
At the top, the OpenAI vs. Anthropic pricing standoff holds:
- Claude Opus 4.8: $6.25 input / $25.00 output / $4.10 blend
- GPT-5.5 (xhigh): $5.00 input / $30.00 output / $4.35 blend
GPT-5.5's output rate is 34× DeepSeek's. At scale, that gap isn't measured in dollars — it's measured in business models. A startup routing all inference through GPT-5.5 xhigh vs. DeepSeek V4 Pro at equivalent quality trade-offs pays 34× more per output token. The price war in the sub-$1/M bracket is compressing fast, but the top two teams show no sign of moving 3 4.
Context window report
| Team | Widest model | Context | Speed |
|---|---|---|---|
| Meta | Llama 4 Scout | 10M tokens | 107.4 t/s |
| xAI | Grok 4.20 0309 | 1M tokens | — |
| Gemini 3.1 Pro Preview | 1M tokens | 143.0 t/s | |
| Anthropic | Claude Opus 4.8 | 1M tokens | 55.6 t/s |
| OpenAI | GPT-5.5 | 922k tokens | 60.7 t/s |
| DeepSeek | V4 Pro | 1M tokens | 47.3 t/s |
Meta's 10M context moat is unchallenged — and Llama 4 Scout is doing it for $0.22/M blend 5. For document-heavy applications (legal review, full-codebase RAG, long-session agentic workflows), the open-source community team is the only option at that window size. Nobody else is in the same stadium.
Challenger watch

Qwen3.7 Max (Alibaba) entered the top-10 today, scoring AI Index 57 — level with Gemini 3.1 Pro and Claude Opus 4.7 — at $1.43/M blend and 189 t/s 6. That's the most speed-efficient 57-point model in the dataset: three times faster than Claude Opus 4.7 and 30% cheaper than Gemini 3.1 Pro, at the same intelligence reading. Alibaba's challenger squad is now legitimately in the conversation for cost-sensitive production deployments.
Muse Spark (Meta) debuted today at AI Index 52 with no pricing data — flagged as a new unpriced Meta entry. Whether this is a public release teaser or an internal preview leaking into the leaderboard remains unconfirmed.
MiMo-V2.5-Pro (Xiaomi) holds steady at 54 index / $0.18 blend, continuing its run as the only model matching DeepSeek on price while outscoring it by two points.
Post-game summary
Four game days in, the scoreboard is settling into a pattern: Claude and GPT-5.5 share the top two intelligence slots with no daylight between their scores. The race below #2 is becoming the actual competition. Gemini 3.1 Pro's +25% speed spike today pushes it to 143 t/s at AI Index 57 — the best speed-intelligence ratio among non-Alibaba models in the $1-3/M range. Qwen3.7 Max at 189 t/s and the same index score is the new benchmark that tier has to answer.
DeepSeek's permanent $0.18 price is no longer a news story — it's just the floor. Anyone entering the sub-$1/M segment competes against it as a given.
The Monday opener for June: two speed surges, one serious challenger landing, and the top-two spots still locked at 61 and 60. Game Day 5 tomorrow.
#AILeague
このコンテンツについて、さらに観点や背景を補足しましょう。