Thinking Machines Labs Takes Aim at AI’s “Nondeterminism Problem”

Niv Nissenson
Sep 15
2 min read

LLM representation by Thinking Machines Labs article by Horace He. Researching non determinism in LLM

Thinking Machines Labs, the startup that recently raised the largest seed round in history, has been unusually quiet about what it’s building. But a newly published article from the team offers a revealing clue: they may be taking on one of the deepest technical puzzles in large language models (LLMs). The article was written by Horace He, one of the Thinking Machine's founding team with the role of "Interested in making both researchers and GPUs happy".

The Issue: Why AI Can’t Give the Same Answer Twice

The article wrestles with a question many AI users have noticed: why do LLMs give different answers to the same prompt, even when the temperature is set to zero?

"This is usually what folks mean by “nondeterminism” — you execute the same kernel twice with exactly the same inputs and you get a different result out. This is known as run-to-run nondeterminism, where you run the same python script twice with the exact same dependencies but get a different result."

Temperature is supposed to control randomness. Set it to zero, and you should, in theory, get the same response every time. Yet, nondeterminism persists.

The authors trace the problem to the way modern computers handle numbers. Unlike integers, which are exact, floating-point numbers (used to represent decimals) introduce tiny inconsistencies when added in different orders.

In parallel computing, where LLMs split work across many cores, those floating-point sums can complete in unpredictable sequences. The result? Slightly different calculations which cascade into noticeably different outputs.

TheMarketAI.com take

On one level, LLMs are supposed to be probabilistic. That’s their design: they predict the next word based on probabilities. A little variation is expected.

But layering randomness on randomness can produce more variance than is desirable.

This is particularly problematic for two fast-growing fronts in AI:

AI Agents – Systems that need to reliably complete tasks step-by-step can’t afford wide swings in output.
AI Scaling – To run LLMs at massive scale, consistency and predictability are critical. Too much variance undermines trust and efficiency.

Recent reports suggest that as many as 95% of AI pilots fail. The reasons are varied from poor data pipelines, security concerns to hallucinations, but we’d wager that the unpredictability of LLMs ranks near the top. If Thinking Machines is directing research toward this issue, it’s a strong signal that they see it as a critical barrier to real-world adoption.