Multimodal · 1 piece on file

Multimodal

Vision, audio, video, and the messy edges where modalities meet. Charts and documents the models still fail on.

Feature · MAY 19, 2026

Google Gemini Omni: world-understanding multimodal at scale, any-input-to-any-output

Announced at Google I/O on May 19, Gemini Omni is positioned as a leap in world understanding, multimodality, and editing — generating any output from any input, starting with video.

By Lucia Castellan · Multimodal beat

Read the full piece →