Converging models, lazy pattern matchers, or is OKC just that good?

I asked seven frontier models to predict the 2026 NBA Finals. All seven picked Thunder in six. Boring or predictable?

Models polled

Picked Thunder in 6

7 / 7

Picked the Knicks

It pains me to say this on May 12 as a Boston native, but the Celtics are not winning the championship this year. With my team out, I've done what any reasonable fan does in mourning: picked a new horse. The Knicks. They're scrappy, they're likeable, Brunson is must-watch, and they just swept the Sixers by a combined 89 points. I'm in.

So I figured I'd see if any of the frontier models agreed with me.

I gave the same prompt to seven of them: Claude Opus 4.7, Claude Sonnet 4.6, ChatGPT o3, GPT-5.5 Instant, GPT-5.5 Think, Grok, Gemini Pro. Predict the 2026 NBA Finals. Matchup, series result, game-by-game scores. I figured at least one would humor me.

All seven picked the Thunder over the Knicks in six. Not one said Knicks. Not one said seven games. Not one said five. The predicted scores even clustered inside the same ten-point band, game after game. Full, boring agreement.

Game-by-game predicted scores from seven frontier models. Faded bars = predicted loser.

If I were a betting woman, I'd push the chips. But, the question is if this a trustworthy prediction - or just model pattern matching? So the more interesting question is the one in the title: are these models actually converging in how they reason — or did I just hand them a problem with one obvious answer?

The boring agreement

Here's what they all saw. OKC is the defending champ. Undefeated in this year's playoffs. Just swept the Lakers. Has the reigning MVP. The Knicks are scorching, but they're still the underdog. Walk into any sportsbook tomorrow and OKC is a heavy favorite. The models didn't discover that. They reflected it. Is it entirely too human to understand that sports is about being hot at the right time? Is it too human to understand that Tina Fey cheering for you maybe makes a difference?

That's the first thing to flag if you want to use this kind of test to evaluate reasoning. When seven models agree, is it evidence about the question, or the models? Yes, OKC looks like the best team, but do the models weigh historical patterns too much? I always try and remember that these models are not magicians, they are just pattern predictors. No matter what the problem, its important to remember that.

What none of them did

Here's what jumped out reading all seven back to back: not one led with the Knicks' current form.

The Knicks are hot. They are playing the best basketball in the East, possibly the best basketball of anyone in the playoffs not based in Oklahoma City. Star point guard redefining his playoff résumé. Frontcourt revelation in Karl-Anthony Towns. A track record this postseason of winning by margins nobody predicted.

If you were a sportswriter handed this matchup cold and told to write the upset case, you'd find it. The models weren't asked for the upset case - they were asked for the most likely outcome - but every response still framed it the same way: OKC is the historical favorite, here's why they win. The recent form was in every search result. None of them weighted it heavily enough to flip the call, widen the series to seven, or even narrow the probability past 68/32 (which only one model gave at all).

Every model defaulted to the prior. None gave meaningful weight to the live signal. Which matters, because the world's interesting questions are almost always the ones where the prior and the live signal disagree.

Where they actually differed

The outputs converged. The narratives didn't. That's the part worth looking at.

GPT-5.5 Think was the most honest about its own simplicity: "The model logic is pretty simple: New York has the shot creation and toughness to make this a real series, but OKC has the better top player, better two-way depth, better road profile, better defense, and the better recent head-to-head evidence." That's not analysis. It's a historical checklist. And to its credit, the model said so.

Opus 4.7 was the only one that engaged with the trend data I was actually looking for: "Jalen Brunson's on-ball percentage has dropped from 50.1% in 2024 to 48.5% in 2025 to 43.7% in 2026. He's fresher and more efficient because the Knicks finally diversified the offense." It went further on Towns, "His career playoff PER coming into 2026 was 17.9; he's at 30.5 right now" and called him a matchup nightmare nobody had penciled in. Opus saw what I wanted it to see. It just didn't let that change the call.

o3 was the only model that named the conditions under which it would have been wrong: "If Brunson continues his 63 TS% pace, Anunoby returns at full mobility, and NY keeps the offensive-rebound margin above six per game, a seven-game Knicks win is plausible — particularly if Thunder's young bench wobbles under MSG pressure." That's the most useful sentence in any of the seven responses. It's the only one that tells me what to watch.

Sonnet 4.6 was the only one that gave me a probability — 68% OKC, 32% NYK — and the cleanest visual breakdown of game-by-game outcomes. It was also the only one that predicted the Knicks would steal a game in OKC.

Three models (Grok, Opus 4.7, o3) played pure chalk: home team wins every game except the closeout...BORING! Four (Sonnet, Gemini, both 5.5s) had the Knicks steal Game 2. o3 needed an overtime game to make the math work. Same destination, seven slightly different paths.

Maybe this just proves the obvious

LLMs are pattern-matchers. Every one of these models had web search. They had the Knicks' current form, the sweep, Brunson's playoff numbers, Towns' impossible shooting splits. The data was there. They just didn't weight it.

Why? Because "defending champ with the MVP wins the Finals" is a much stronger pattern in their training data than "hot team comes out of nowhere and beats a healthy, undefeated favorite." The first thing has happened hundreds of times. The second is the kind of result that gets a documentary made about it precisely because it's rare. Faced with a question that pits a strong historical pattern against a weaker recent one, the models did exactly what pattern-matchers do. They picked the pattern. Boring!

Which makes me wonder if the Knicks are, in some sense, an unpredictable team — not because they're unbeatable, but because what they're doing right now doesn't fit a shape the models have seen before. The training data is good at telling you what usually happens. It is, almost by definition, bad at telling you when this time is different.

I'm still going to root for them. The models said OKC in six. The sportsbooks said OKC in six. But the most interesting thing a model can do is tell you what would have to be true for it to be wrong — and o3, at least, gave me a list. If Brunson keeps shooting 63%, if Anunoby comes back healthy, if the Knicks keep crushing the offensive glass, if the Thunder bench wobbles at MSG.

Maybe it just comes down to Tina Fey or Timothee Chalamet tripping the right person. Unpredictable.

Back to all posts— Jackie