Open vs Closed AI Models: How the Gap Collapsed in 2025-2026 and Where It's Heading

In January 2025, a Chinese lab most people had never heard of released an open model that wiped roughly seventeen percent off the value of the most valuable chipmaker on earth in a single day. Eighteen months later, the question is no longer whether open models can compete with the closed systems from the big US labs. It is how many months behind they are, and whether that gap ever fully closes.

This is a piece about that trajectory: where open and closed models stood at the start of 2025, how far the gap has closed by mid-2026, and where the evidence suggests it goes next. The focus is on the models and the benchmarks rather than the politics, though the politics are impossible to ignore entirely, because the open frontier turned out to be largely a Chinese story.

A quick definition to keep things precise. Most models people call open source are actually open weight: the trained model is released to download and run, but the training data and code are not. The distinction matters, and it is getting blurrier, as the later sections explain. For readability this piece uses open and open weight interchangeably, but the asterisk is real.

The starting line: where things stood in early 2025

For most of the modern AI era, the assumption was that the big US labs held a durable lead. The reasoning was infrastructural: frontier models required enormous compute, that compute ran on advanced chips, and export controls were designed specifically to keep those chips, and therefore the lead, on the American side.

Then came the release that reset the conversation. In January 2025, the Chinese lab DeepSeek put out an open reasoning model, under a permissive license, that performed nearly on par with the best closed reasoning model available at the time. It matched the leading US reasoning model across most benchmarks while costing a fraction as much to use, around twenty-seven times cheaper per token, and it was the first time since 2019 that a frontier-class model had been released openly by a Chinese company.

The market reaction was violent. On the day the news spread, the leading AI chipmaker's stock fell around seventeen to eighteen percent, and several other AI-exposed companies dropped with it, on fears that if models could be trained this cheaply, the demand for the most expensive hardware might not be what everyone assumed. By the end of that month the model had overtaken the best-known US chatbot as the most-downloaded free app on the US App Store, and a prominent investor called it a Sputnik moment.

What made it a watershed was not that it came from nowhere. It was that it confirmed a trend the data was already showing. The performance gap between the best American and best Chinese models had been closing fast. On one widely used head-to-head leaderboard, the US lead over the best Chinese model had shrunk from around nine percent in early 2024 to under two percent by early 2025, and across standard benchmarks the gaps that had been seventeen to thirty points at the end of 2023 had narrowed to low single digits by the end of 2024.

Worth noting even at this starting point: the open frontier was already tilting Chinese. When the big US open model arrived a few months later, it lagged the Chinese open releases on many benchmarks. The story was not only open catching closed. It was specifically Chinese open models setting the pace.

Mid-2026: a gap measured in months, not tiers

Move forward to the present, and the convergence has gone from striking to almost complete at the level of headline benchmarks.

The single most cited number comes from Stanford's annual AI Index. As of early 2026, the performance gap between the best American and best Chinese models had collapsed to 2.7 percent, down from somewhere between seventeen and thirty-two points in mid-2023, and that near-parity held despite the US spending many times more on AI investment. US and Chinese models have traded the top position multiple times since early 2025.

The most useful way to express the gap, though, is not in percentage points but in time. The research group Epoch AI found that open-weight models now trail the state of the art by roughly three months on average, down from closer to a year in late 2024. An MIT study put numbers on the same trend from a different angle. Open models reach about ninety percent of closed-model performance at release and tend to close the remaining gap within about thirteen weeks, down from twenty-seven weeks just a year earlier, while costing about eighty-seven percent less to run.

The current crop of leading open models makes the trend concrete, and it is dominated by Chinese labs. DeepSeek's newest model is a mixture-of-experts system with 1.6 trillion total parameters and a million-token context window, the largest open-weight model available, claiming performance comparable to the leading closed models on coding while undercutting them dramatically on price. A lab called Zhipu released a frontier-grade open model in June under a permissive license, and Moonshot shipped a trillion-parameter open model built for coding around the same time. Meanwhile Alibaba's Qwen family crossed one billion downloads, overtaking the big US open model as the most-downloaded open model in the world.

Cost is where the gap is not closing but inverting. The cheaper of DeepSeek's new models costs around fourteen cents per million input tokens and twenty-eight cents per million output, undercutting the budget tiers of every major closed provider. Inference costs across the board have been falling roughly tenfold per year. The original watershed model already made the point: it matched a frontier reasoning model at roughly three percent of the cost.

But the parity has limits, and they matter. The US still produces more top-tier models, with roughly fifty to sixty notable releases in 2025 against China's thirty-five or so. The single best model on the main head-to-head leaderboard is still a closed US model, leading by that narrow 2.7 percent. And the leading open models are still text-only, where the closed frontier handles images, audio, and video, and they tend to trail on the hardest knowledge tests, with DeepSeek itself describing its trajectory as trailing the frontier by three to six months. So the honest summary of mid-2026 is: caught up on most benchmarks, much cheaper, still behind on the bleeding edge and on the things benchmarks do not capture well.

The open-weight asterisk

Before reading too much into benchmark convergence, two cautions are worth holding, because they cut against taking the parity numbers at face value.

The first is that the openness itself is narrowing. By 2025, fewer than forty percent of so-called open models met basic open-source criteria, and for the first time downloads of opaque open-weight models outnumbered downloads of genuinely open ones, with licensing growing more restrictive and model gating more common. The category is commercializing and fracturing, so open is less open than the label implies, and getting more so.

The second is that benchmark parity can overstate real parity. Chinese open labs tend to focus more on benchmark scores than their US closed counterparts, partly because staying visibly close to the frontier is crucial to fundraising and adoption, and closed models tend to be more robust and generally useful than open models with similar scores, owing to hard-to-measure qualities that current benchmarks miss. A model that ties on a test can still be worse at the messy, constantly-shifting demands of real work.

There is a paradox layered on top. Even as capability converged, transparency fell: the main index tracking model openness dropped from fifty-eight to forty, with less disclosure on training data, parameter counts, and compute. The models are getting more capable and less legible at the same time.

Where it's heading: the real debate

This is where informed people genuinely disagree, and the disagreement is the most interesting part of the story. There are two credible cases, and they point in opposite directions.

The case that open keeps closing

The first case is that the trend lines simply continue, and open eats the middle of the market.

The evidence is the trajectory itself. The open lag compressed from twelve months to three in two years, inference costs are falling tenfold a year, and the knowledge-benchmark gap is effectively gone. Extrapolate that and the range of tasks where a cheaper open model is clearly good enough keeps widening. For most enterprise work, document analysis, customer triage, code review, structured extraction, the open model is already the rational default on cost and data-privacy grounds alone.

The framing here is classic disruption, the Linux and Android pattern: start cheaper, improve through a global community, and let the value migrate up the stack. One economist estimated that reallocating demand from closed to open where the quality difference does not matter could save the global AI economy around twenty-five billion dollars a year. As the model commoditizes, the argument goes, value moves to serving and building on models, captured in one CEO's framing of the divide as human capital and token capital, where firms that own a learning loop compound an advantage over those who merely rent intelligence.

The sharpest version of this case argues that by around 2028, when the leading closed labs need to demonstrate real margins, the good-enough line will have risen far enough to pressure their pricing power.

The case that closed pulls ahead

The second case is that the frontier keeps running away, and the gap grows rather than shrinks. The most striking thing about this argument is who makes it.

One of the most respected researchers in the open-model world argues the open-closed gap is more likely to grow than shrink, because the top labs are improving as fast as ever and many of their gains are not captured by public benchmarks at all. The mechanism that let open models fast-follow, distillation, is also getting harder. In the new era of coding agents, the valuable ingredient is the complex training environments and prompts, which are far easier to keep hidden than the model outputs that earlier distillation relied on.

There is also a question of where the frontier is going. Coding could largely be solved by scraping public repositories, but the next valuable domains, areas like legal and healthcare work, live in data that is not on the public web and is much harder to replicate, which favors the labs with the resources and partnerships to reach it. And the labs themselves are converging on a posture. An emerging consensus holds that the strategy is open source behind the frontier, closed source at the cutting edge, with one major US lab delaying the release of its largest open model and signaling it may keep its most capable systems closed, citing safety. If the very best models are deliberately withheld, benchmark parity on the models that do get released tells you less than it seems.

One more data point complicates the simple convergence story. Even as China and open models closed in on the top, the gap between the three leading frontier labs and the broader field of AI startups widened in 2025, as compute, proprietary data, and scarce talent created structural advantages that capital alone could not quickly overcome. Convergence at the very top coexists with divergence just below it.

Where this likely lands

Put the two cases together and the probable shape is not one side winning but a stable two-tier structure.

On one tier sit closed frontier models, used for the bleeding edge: the hardest knowledge work, the most demanding agentic systems, polished multimodal capabilities, and the role of a direct assistant facing a constant stream of novel problems, where those hard-to-measure qualities matter most. On the other sit open frontier models, increasingly Chinese-led, that are genuinely excellent for a large and growing share of real work at a fraction of the cost.

The clearest sign of how far things have moved is that the question itself changed. It has shifted from can open source compete to when should you choose open versus closed, with most production systems expected to route between them based on the task, the cost, and the latency. That shift is the real answer to how it is going: open decisively won the argument that it is a serious option, without yet winning the frontier.

On the national dimension, kept deliberately light here, one nuance is worth flagging. At least one close observer expects the US to slowly regain ground in open-model adoption from early 2027, as efforts like Google's open releases and others gain traction and China's blistering release velocity eventually slows. But China's open lead in adoption right now is real, and the hardware story that was supposed to prevent all this is loosening too: DeepSeek's newest model reportedly runs on Chinese domestic chips, making it among the first frontier-class models built entirely outside the US hardware ecosystem.

What this means for builders

For anyone actually choosing models, a few practical implications fall out of the data.

Widen your shortlist. Capability has converged enough that reaching for the most expensive closed model by default, for every task, is now a real choice rather than an obvious one, and often the wrong one on cost. The teams that benefit are the ones testing options on their own work rather than treating a leaderboard as a buying decision.

Route by task. The emerging best practice is to use closed models for the bleeding edge, the hardest reasoning, the most demanding agentic and multimodal work, and open models for the high-volume, cost-sensitive, privacy-bound middle, where they are already good enough and far cheaper.

Keep your independence. Even if you mainly use closed APIs, the existence of strong open alternatives is leverage, and architecting so you can switch protects you from any single provider's pricing or terms. The portability of open weights is worth something even unused.

Read benchmarks skeptically. With some labs optimizing for scores and transparency falling across the board, the gap between a benchmark number and real-world usefulness is widening, so your own evaluation on your own tasks matters more than it used to.

Conclusion

Eighteen months ago, the open-versus-closed debate was about whether open models could compete at all. Today they trail the frontier by months rather than tiers, they lead on cost outright, and in much of the market they lead on adoption, with Chinese labs setting the pace. That is a genuine shift in the landscape, and the data behind it is not subtle.

But the frontier itself is still closed and still, narrowly, US-led, and the people closest to the field disagree in good faith about whether open closes that final gap or whether the frontier keeps inventing new ground that open cannot reach. The most defensible bet is not on which side wins. It is that the model layer is commoditizing faster than almost anyone expected, that the value is migrating to what gets built on top, and that the right question for the next two years is no longer which model is best, but which model is right for the job in front of you.

Frequently asked questions

Is open source AI catching up to closed source?

Yes, dramatically. In early 2025 open models were clearly behind; by mid-2026 they trail the leading closed models by roughly three months on average, down from about a year in late 2024, and they have effectively closed the gap on knowledge benchmarks. The catch-up is real, though closed models retain an edge on the bleeding edge and on qualities benchmarks do not capture well.

Are Chinese AI models as good as US models now?

On headline benchmarks, very nearly. Stanford's AI Index put the gap between the best US and best Chinese models at about 2.7 percent in early 2026, down from seventeen to thirty-two points in 2023, and the two have traded the top spot repeatedly. The US still produces more top-tier models and leads at the very top, but the lead is now measured in low single digits.

What's the difference between open-source and open-weight models?

An open-weight model releases the trained model for anyone to download and run, but not necessarily the training data or code. True open source releases everything, enough to reproduce the model. Most models marketed as open source are only open weight, and by 2025 fewer than forty percent of so-called open models met basic open-source criteria.

Will open source AI overtake closed models?

Informed opinion is genuinely split. One camp points to the closing gap, falling costs, and a classic disruption pattern, and expects open to commoditize most of the market. Another, including respected open-model researchers, argues the frontier labs are pulling ahead in ways benchmarks miss and that the gap may grow. The likeliest outcome is a two-tier world rather than one side winning outright.

Why are open models so much cheaper?

Because once the weights are released, running them costs only compute, with no licensing or API margin, which creates commodity-market pricing pressure. Open inference runs roughly eighty-seven percent cheaper than comparable closed models, and some open models cost a fraction of the closed budget tiers. Inference costs overall have been falling about tenfold per year.

Which should I use, open or closed?

It depends on the task. Closed models are generally the better choice for the most demanding reasoning, agentic, and multimodal work, and for an assistant facing constant novel problems. Open models are increasingly the rational default for high-volume, cost-sensitive, or privacy-sensitive work like document analysis, extraction, code review, and triage. Many production systems now route between both.

Is the best AI model in the world still closed?

As of mid-2026, yes, narrowly. The single highest-ranked model on the main head-to-head leaderboard is a closed US model, leading the best open and Chinese alternatives by a small margin. But the lead is the smallest it has ever been, and the position has changed hands several times since early 2025.

Sorca Marian

Founder/CEO/CTO of SelfManager.ai & abZ.Global | Senior Software Engineer

https://SelfManager.ai
Next
Next

The Psychology of Social Media: Why It Scaled So Fast and Why We're Still Here Decades Later