Thinking Machines Lab Pushes Beyond Turn-Based AI With “Interaction Models”

May 24
2 min read

A new report by Market Tech Post details how Thinking Machines Lab is developing a new AI architecture aimed at making human-AI interaction continuous rather than turn-based.

The company describes the system as an “interaction model,” designed to process audio, video, and text in real time while simultaneously generating responses. Instead of waiting for users to finish speaking or typing, the architecture continuously exchanges information in 200ms “micro-turns,” allowing the model to effectively listen and respond at the same time.

https://www.youtube.com/watch?v=A12AVongNN4

The architecture separates tasks into two layers:

an always-on interaction model handling real-time conversation
and a background reasoning model responsible for deeper planning, tool use, and web interaction.

Thinking Machines argues that current AI systems rely too heavily on stitched-together “harnesses” like voice activity detection and external pipelines to simulate responsiveness. Their goal is to make interactivity native to the model itself.

The company also introduced benchmark results showing strong performance in latency-sensitive multimodal tasks such as simultaneous speech, proactive video understanding, time-aware interaction, and streaming visual reasoning. The model reportedly outperformed existing real-time systems from OpenAI and Google across several interaction-focused benchmarks.

The architecture uses a large Mixture-of-Experts model with 276 billion parameters and introduces specialized inference optimizations to handle continuous low-latency streaming.

Thinking Machines says broader access to the system is expected later in 2026.

TheMarketAI Take

We’ve covered Thinking Machines extensively not just because of its talent concentration and high-profile departures, but because the company appears to be pursuing a genuinely different view of how AI systems should interact with humans. We initially predicted they're building their own model and then we covered the reports they were looking into the multi-modal application of AI.

While multimodal AI opens the door to compelling possibilities, the real test will be in actual public deployment — and we’re not there yet. The demo video is impressive and highlights some of the potential advantages of real-time multimodal interaction, but considering Thinking Machines’ $50B valuation (at least that's what the asked for according to reports) and the caliber of talent behind it, expectations are understandably much higher.