Asilo Weighs in on DeepSeek’s Disruptive AI Model

Stockholm (HedgeNordic) – Artificial intelligence has dominated headlines in recent months, with investors and Mr. Market alike betting on a growing demand for powerful microchips, energy, and data centers to fuel AI development. However, the release of a new AI model called DeepSeek-R1 by a Chinese upstart has upended some core assumptions about AI progress. The model demonstrates that AI advancements may not require as many high-end chips as previously expected. This revelation rattled markets on Monday, sending Nvidia shares tumbling 17 percent and triggering sell-offs across other key “picks and shovels” stocks that support the development of AI.

Finnish portfolio managers Ernst Grönblom and Henri Blomster, who run a high-conviction strategy focused on identifying “future superstar” stocks, increased the AI exposure in Asilo Argo’s portfolio from about 15 percent at the start of 2024 to over 40 percent by year-end. However, the duo is not too worried about their AI investments after hearing that China’s DeepSeek startup has created the R1 model, which could challenge some of today’s leading AI models at a much lower cost. Grönblom, known for his understanding of concepts and mental models, references the Jevons paradox when discussing how DeepSeek’s more efficient model might affect the market.

“In general, it’s known that improving efficiency not only reduces the amount needed for a given purpose but also lowers the relative cost of using a resource, which in turn tends to increase its overall demand.”
Ernst Grönblom

“In general, it’s known that improving efficiency not only reduces the amount needed for a given purpose but also lowers the relative cost of using a resource, which in turn tends to increase its overall demand,” Grönblom explains, referencing the Jevons paradox. “The increased efficiency of coal use led to greater coal consumption, so why wouldn’t the same happen with GPUs [graphics processing units]?” he adds. However, Grönblom and his co-manager Blomster are quick to point out that their current thoughts are a “quick-and-dirty” initial take on the matter.

The Technicals Behind the Breakthroughs of the DeepSeek Model

According to Grönblom, a key breakthrough lies in their “sophisticated mixed-precision training framework, which allows the use of 8-bit floating point numbers (FP8) throughout the entire training process,” unlike the “full precision” 32-bit numbers typically used by Western AI labs. This innovation, he explains, “saves memory and boosts performance,” leading to a dramatic reduction in GPU requirements since each GPU can handle much more data. Another major advancement is the R1’s ability to “predict multiple tokens simultaneously while maintaining the quality of single-token predictions,” effectively doubling inference speed without compromising much on quality.

Grönblom also highlights their novel Multi-head Latent Attention (MLA), which “stores a compressed version of Key-Value indices, capturing essential information while using far less memory.” Additionally, their Mixture-of-Experts (MOE) Transformer architecture activates “only a small subset of parameters at any given time.” This means that during inference, only this subset of weights needs to be stored in VRAM. For example, an MOE model with 671 billion parameters may only have 37 billion parameters active at any given time. Grönblom notes that “the sum total of these innovations, when layered together, has led to the 45x efficiency improvement.”

U.S. Labs to Study DeepSeek’s R1

Regardless of whether the training of R1 was as cost-effective as claimed, Henri Blomster believes leading U.S. labs will likely study the technological breakthroughs it offers and, if deemed useful, adopt them. “This should lead to more efficient compute, both in terms of training and inference. When you apply that efficiency to the vast number of GPUs available in the U.S., while China continues to face restricted access to computing power, the implications become significant,” he argues. “We believe scaling laws apply even after R1 DeepSeek.”

“This should lead to more efficient compute, both in terms of training and inference. When you apply that efficiency to the vast number of GPUs available in the U.S., while China continues to face restricted access to computing power, the implications become significant.”
Henri Blomster

Blomster emphasizes that if the ultimate goal of companies like Anthropic, OpenAI, and other major players were to build an o1 model and sell it to consumers, “then yes, we would be alarmed by the DeepSeek R1 model’s potential in destroying the value of US top lab’ investments.” However, he argues, “o1 is not the end goal.” While its feasibility is debatable, Blomster argues that “leading labs have the goal of building a digital God.” These labs have a clear roadmap, and measurable progress, and are nearing their goal, as described by Anthropic’s CEO, Dario Amodei, who stated: “My view and I’ve been saying over the last few days that I’m becoming more confident in it, this idea that we may be only two or three years away from A.I. systems being better than humans at almost all tasks.”

AI as the Manhattan Project 2.0

According to Grönblom, AI has the potential to accelerate progress across all technological fields, including military technology. “This is the Manhattan Project 2.0, with the power to shift the global balance of power,” he argues. “It seems that China has taken a step forward in a race the U.S. cannot afford to lose.” In response, he believes the logical course of action would not be to slow down development but to intensify efforts even further.

“In summary, we do not believe that algorithmic improvements will reduce the demand for AI infrastructure.”
Ernst Grönblom

“In summary, we do not believe that algorithmic improvements will reduce the demand for AI infrastructure,” concludes Grönblom. “We do not believe that leading labs will see this as an opportunity to slow down their investments, we believe they will see this as an opportunity to get to the end goal of Artificial Superintelligence at an even faster rate.”

Related