Infrastructure · 5 pieces on file

Infrastructure

Serving stacks, kernels, GPU economics, and the gap between published throughput and what reproduces on real hardware.

Feature · JULY 23, 2026

Alphabet Q2 2026: Cloud accelerates to 82%, capex guide pushed to $205B as Gemini serves 22B tokens/minute

Google Cloud revenue jumped to $24.8B on enterprise AI infrastructure demand, and Alphabet lifted 2026 capex guidance by $15B — with CFO Anat Ashkenazi telling analysts demand 'continues to outpace supply across the industry.'

By Lars Iverson · Open source & model weights

Read the full piece →

More in Infrastructure

JULY 12, 2026

GPT-5.6 Sol on Cerebras: OpenAI ships frontier inference at 750 tokens/sec

After a 12-day government-gated preview, OpenAI's Sol, Terra, and Luna reached general availability on July 9. The architectural news is Sol running on Cerebras wafer-scale hardware at up to 750 tokens/sec — the delivery milestone of a $10B, 750-megawatt deal signed in January.

By Aiko Tanaka · Inference & serving
JULY 7, 2026

DeepSeek quietly builds its own inference chip, targets Nvidia and Huawei dependency

Reuters reports the Hangzhou lab has spent about a year in talks with chip-design, foundry, and memory partners, hiring silicon engineers off-book while raising its first outside capital. Nvidia slipped 1.6% in premarket.

By Aiko Tanaka · Inference & serving
JUNE 7, 2026

Apple licenses a 1.2T-parameter Gemini MoE for Siri, runs it on B200s inside Private Cloud Compute

Bloomberg, TechTimes and Google Cloud's own CEO line up the same architecture ahead of Monday's WWDC keynote: a custom mixture-of-experts Gemini, ~$1B/year, weights sitting on Nvidia B200s inside Apple-controlled enclaves.

By Lars Iverson · Open source & model weights
MAY 12, 2026

vLLM v0.20.2 ships Model Runner V2: up to 56% higher throughput on GB200

The May 2026 stable release of vLLM bundles a new GPU-native Triton kernel async-scheduling stack, FP8 inference, and continuous batching as the default.

By Aiko Tanaka · Inference & serving

Alphabet Q2 2026: Cloud accelerates to 82%, capex guide pushed to $205B as Gemini serves 22B tokens/minute

More in Infrastructure

GPT-5.6 Sol on Cerebras: OpenAI ships frontier inference at 750 tokens/sec

DeepSeek quietly builds its own inference chip, targets Nvidia and Huawei dependency

Apple licenses a 1.2T-parameter Gemini MoE for Siri, runs it on B200s inside Private Cloud Compute

vLLM v0.20.2 ships Model Runner V2: up to 56% higher throughput on GB200