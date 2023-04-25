Midjourney

Just kidding: we all know size matters. It's definitely true for AI models, especially for those trained on text data, i.e., language models (LMs). If there's one trend that has, above all others, dominated AI in the last five or six years, it is the steady increase in parameter count of the best models, which I've seen referred to as Moore’s law for large LMs. The GPT family is the clearest—albeit not the only—embodiment of this fact: GPT-2 was 1.5 billion parameters, GPT-3 was 175 billion, ~100x its predecessor, and rumors have it that GPT-4’s size, although officially undisclosed, has reached the 1 trillion mark. Not an exponential curve but definitely a growing one.

OpenAI was categorically following the godsend guidance of the scaling laws they themselves discovered in 2020 (that DeepMind later refined in 2022). The main takeaway is that size matters a lot. DeepMind revealed that other variables like the amount of training data, or its quality, also influence performance. But a truth we can’t deny is that we love nothing more than a bigger thing: Model size has been the gold standard for heuristically measuring how good an AI system would be.

OpenAI and DeepMind have been making their models bigger over the years in search of hints from the performance graphs, signs in the benchmark results, or whispers from the models themselves, of an otherwise merely hypothetical path toward AGI, the field’s holy grail. They didn’t find what they were looking for. Instead, they got predictable—although, if you ask me, impressive—improvements in language mastery, that sadly don’t reveal any clear direction toward the next stage.

Size has proven, as they predicted, critical, but it seems companies have practically exhausted the “scale is all you need” doctrine. What’s most striking, the acknowledgment of this new reality doesn't come from a classical AI proponent or a deep learning critic but from OpenAI’s CEO, Sam Altman himself.

Size doesn't matter (as much anymore)

Now the headline is accurate. Altman seems to have accepted this—how could I put it—bitter lesson, too.

Size. Scale. More parameters. That’s not the way forward. His was (and probably still is) the dream of AGI, but he knew scale alone wouldn’t get them anywhere near that goal. He’s never been—publicly—a scale maximalist (i.e., diehard believers of the “scale is all you need” dogma).

I say “publicly” because his decisions as the leader of OpenAI suggest an extreme pro-scale stance, as I briefly reviewed above, regardless of what he claims to believe now. But if anything, I'm pretty sure he would've loved to have been wrong about his forced dismissal of making models bigger as the panacea: If achieving AGI was as straightforward as stacking more layers on top of GPT-4, OpenAI would be better off than anyone could imagine.

But that’s not true and Altman has now publicly admitted as much. He confirmed his view shift in a recent MIT event, Imagination in Action. He said: “I think we’re at the end of the era where it’s gonna be these giant models, and we’ll make them better in other ways.” If his assertion isn't enough to convince you maybe the fact that OpenAI isn't building GPT-5, “and won't for some time,” should do. This turn of events is even funny because all that buzz with the FLI open letter about pausing “giant AI experiments” (concretely, all progress of anything more powerful than GPT-4) was seemingly unnecessary after all.

Interestingly, I don’t think Altman has just observed this after GPT-4. He’s most likely known this to be true for a while. My suspicion comes because the reasons that make it not worth chasing after larger LMs go beyond model performance plateauing—which albeit predictable might miss the emergence of unexpected capabilities. The real reason is that OpenAI researchers knew the company—despite its unprecedented wealth for a startup—would have a hard time meeting the financial and logistic needs of a potential GPT-5. Indeed, Wired’s Will Knight reported last week that “Altman said there are also physical limits to how many data centers the company can build and how quickly it can build them.”

The relevance of this goes beyond Altman's frontal rejection of scale maximalism: it suggests that we may not, for a long time to come, see anything larger than GPT-4 for reasons much more prosaic than AGI ideology.

But then, why build GPT-4 at all? Why make it larger than anything else if it’s so costly to train and run? I don’t think GPT-4’s existence is incoherent with Altman's belief that the AI community must exploit approaches other than scale. One likely explanation is that GPT-4 wasn't as much motivated by the artificial-human-brain fantasy as by the necessity to build a moat sufficiently deep that just looking at it over the edge would imbue doubts in OpenAI's competitors.

I think that's the case: Altman feels safe with the strong combination of ChatGPT's popularity and GPT-4’s unmatched performance. OpenAI has now the privilege to be able to devote time to explore and find a new groundbreaking advantage through other venues, which arguably no one else enjoys.

The promise of smaller language models

With such an incredible moat—they love the word in the tech spaces—Altman is free of ties to openly share this newfound conviction, which suspiciously mirrors that of many in the field who have been advocating for making smaller models better for some time now. (To the cynical ones, it may seem as if Altman is trying to knock down the very ladder that allowed him to climb to an enviable position in AI leadership.)

Emad Mostaque is one of the most prominent examples of “smaller is better.” The founder of Stability.ai recently announced a new family of language models, GPT style, called StableLM suite that goes as tiny as 3B and 7B parameters—his dream is not AGI but swarm intelligence: a compendium of customized, specialized, small models that can enhance humans across tasks instead of a gigantic centralized superintelligence entity capable of everything.

And it’s not just companies, independent researchers are making leaps of progress on the smaller end of the spectrum. Thanks to Meta's LLaMA, whose weights were leaked and shared (not by Meta, of course), and Stanford's Alpaca (an instruction-tuned LLaMA), a wave of open-source language models is flooding GitHub. It’s undeniable that thousands upon thousands of enthusiastic ML devs will make much more progress than a small-ish startup held to deep scrutiny. As open-source developer Simon Willison claimed in March, LMs are “having their Stable Diffusion moment.” Their success is a matter of time.

Don't conflate the reduced size of these models with them being inferior performance-wise: LLaMA 13B benchmark results are comparable to GPT-3 175B despite the latter being 13x larger. Three years have sufficed to improve the state-of-the-art on variables other than size (e.g., finetuning techniques, data quality, or hardware-software optimizations) so that the largest models of 2020 will be eventually dwarfed by the efficiency of much smaller new ones.

Not everyone agrees with capping the size of language models at GPT-4, though. Anthropic, founded by ex-OpenAI Dario Amodei, recently announced the intention to build its next LM after Claude, tentatively—and creatively—called Claude-Next, to be 10x more capable of anything that exists today, which includes GPT-4. Even if size is no longer the main focus, as far as my knowledge goes it’d be a breakthrough to achieve a 10x improvement on GPT-4 without making it significantly larger.

To do this they plan to spend “a billion dollars … over the next 18 months.” Maybe the people at Anthropic believe the scaling laws they helped devise more strongly than OpenAI does. Maybe they don't like being second to OpenAI and will try as hard as they can to surpass them. Or maybe they'll simply realize down the line, as Altman has, that scale maximalism is a dead end.

Why small makes sense

In closing, let me give you three reasons why improving smaller models is the right approach to generative AI and could prove to be a bigger milestone than ChatGPT or GPT-4 (though perhaps not as showy).

First, the vast majority of people, i.e., customers, don't care about having access to the best of the best. We care about the best quality-price relationship. If making requests to the GPT-4 API costs a kidney each month, not many users will take the risk to try to build something on top of it—much less use it for personal reasons—even if its quality is world-class. On the one hand, for many tasks something slightly worse might do, and, on the other, people are definitely willing to go down the quality ladder if the cost reduction is notable: for each $ you spend on the GPT-3.5-turbo API (the best after GPT-4), you would have to spend $22.5 on the GPT-4-8K API to send the same message and get an equally long response. That’s prohibitively more expensive.

Second, almost no one cares about AGI (sorry, Sam). I know you’re hearing this philosophical concept—turned “marketing term”—thrown around all the time now, but the truth is people don't care about it much. The easiest explanation is that almost no one buys the promises it supposedly would entail once realized (e.g., we could solve all the other problems in the world). This means customers have no higher incentives to go up on the ladder of quality. Whatever works for their mundane applications like enhancing sunny pictures on the beach or drafting indistinguishable business emails, will be more than enough. For them, GPT-4 is a sledgehammer in search of flies to kill.

Finally, smaller models are more manageable and better for B2B. If you can have an Alpaca-like model installed locally on your computer (or smartphone), you might prefer paying only the energy costs to run your last-generation consumer GPU over giving OpenAI a substantial sum monthly for them to have your data (they just now announced users can turn off chat history to prevent the company to use their exchanges to train future models). And this extends to companies that want to leverage ChatGPT: We don't know how OpenAI treats, stores, or uses input data (in case you allow them to). Any company would be definitely better off training a customized model on top of a small-ish open-source one so its data is completely safe. Otherwise, employees may fuck it up.

It doesn’t matter if big is what grabs our attention. Small seems now the winning bet.