Why DeepSeek could be good news for energy consumption

2/6/2025 Lauren Laws

Written by Lauren Laws

Q&A

Why DeepSeek could be good news for energy consumption

Artificial intelligence uses a significant amount of electricity. But DeepSeek’s approach could mean savings not only in energy expenditure, but also for finances.

Interviewed by Lauren Laws

AI is big business. What initially began as research is now a hot topic of discussion on Wall Street. It’s even been called the ‘modern gold rush.’ AI companies need funding to build their large language models, and investors are hoping for a bountiful return. It’s a multibillion-dollar cycle of more money, better GPUs and more power.  

But ‘bigger’ doesn’t always mean ‘better.’ DeepSeek-R1’s arrival to most of the American public eye in late January flipped that line of thinking on its head, with the company’s app reaching the number one spot on both Google Play and Apple App Stores.  While countries such as Australia and Italy have already blocked the AI tool over security and data privacy concerns, the fact remains it’s forced AI companies in America and beyond to grapple with the idea that large language models might not require as many resources (both money and chips) and energy as they originally thought.

 

Meet our expert: Dr. Deming Chen

Deming Chen in his office, posing for a photograph

 

Dr. Deming Chen is the director of the AMD Center of Excellence and co-director of the IBM-Illinois Discovery Accelerator Institute. His work has had a significant impact, with open-source solutions adopted by industry, such as FCUDA, DNNBuilder, CSRNet, SkyNet, ScaleHLS, and Medusa. Notably, Medusa has been integrated into Nvidia's TensorRT-LLM, improving the speed of large language model (LLM) execution by 1.9-3.6x.  

The following interview text has been edited for clarity and brevity. 

AI chatbots take a large amount of energy and resources to function, although some people may not understand exactly how. Why do they take so much energy to run?

The main reason is driven by large language models. The technology behind such large language models is so-called transformers. It's a deep neural network with many layers and typically contains a huge amount of model parameters. For example, people estimated that ChatGPT-4 probably has more than 1 trillion parameters. When people try to train such a large language model, they collect a large amount of data online and use it to train these models. It involves thousands to tens of thousands of GPUs to train, and they train for a long time -- could be for a year! That's why it's both very costly and why it also consumes a lot of energy. 

We are not just talking about ChatGPT. This includes other language models like Gemini, Llama, and others. There's a competition behind and people try to push the most powerful models out ahead of the others. If they win the AI war, then that’s a financial opportunity and may mean taking a larger portion of the growing AI market. Meanwhile, companies are trying to buy as many GPUs as possible because that means they will have the resource to train the next generation of more powerful models, which has driven up the stock prices of GPU companies such as Nvidia and AMD. 

DeepSeek claims to be just as, if not more powerful, than other language models while using less resources. How is it possible for this language model to be so much more efficient? 

In DeepSeek’s technical paper, they said that to train their large language model, they only used about 2,000 Nvidia H800 GPUs and the training only took two months. Think of H800 as a discount GPU because in order to honor the export control policy set by the US, Nvidia made some GPUs specifically for China. They’re not as advanced as the GPUs we’re using in the US. So, finishing the training job with 2000 discount GPUs in a relatively short time is impressive. 

Main page for DeepSeek AI
DeepSeek-R1 debuted on January 20, 2025. Hundreds of billions of dollars in big technology stocks were wiped out one week later after investors worldwide sold their stocks with concerns about a possible threat to American AI dominance.

DeepSeek mentioned they spent less than $6 million and I think that’s possible because they’re just talking about training this single model without counting the cost of all the previous foundational works they did. It’s not as big as ChatGPT-4. It’s more than 600 billion parameters, so it’s still sizeable. Because they open sourced their model and then wrote a detailed paper, people can verify their claim easily. My thinking is they have no reason to lie because everything’s open. Note they only disclosed the training time and cost for their DeepSeek-V3 model, but people speculate that their DeepSeek-R1 model required similar amount of time and resource for training. 

Their training algorithm and strategy may help mitigate the cost. They did identify some interesting phenomenon behind their training procedures and their training can converge faster. As a result, they use less resources. I’m glad that they open sourced their models. This is supposed to benefit the AI community and industry, so Meta, Open AI, Google and others can borrow the ideas. If they can reduce the training cost and energy, even if not by ten times, but just by two times, that’s still very significant. I treat it as a positive development. It can help the AI community, industry, and research move forward faster and cheaper. 

What exactly did DeepSeek do with their algorithm that allowed them to cut energy costs?

DeepSeek has a model called DeepSeek-R1-Zero. I think they got the name after Google’s AlphaZero. AlphaZero is a machine learning model that played the game Go with itself millions and millions of times until it became a grand master. It beat a world champion by a large margin. DeepSeek-R1-Zero follows a similar strategy and applies large-scale reinforcement learning (RL) algorithm directly without supervised fine tuning (SFT). SFT takes quite a few training cycles and involves manpower for labeling the data. It’s effective, but it’s quite costly.  

To their and our surprise, their large-scale RL worked. This RL engine explored different strategies. It taught itself repeatedly to go through this process, could perform self-verification and reflection, and when faced with difficult problems, it can realize it needs to spend more time on a particular step. It’s a quick path to reach a high-quality level comparable to other larger language models, yet smaller and cheaper. They also employed other techniques, such as Mixture-of-Experts architecture, low precision and quantization, and load balancing, etc., to reduce the training cost. To produce the final DeepSeek-R1 model based on DeepSeek-R1-Zero, they did use some conventional techniques too, including using SFT for fine-tuning to target specific problem-solving domains.

We’ve already seen how DeepSeek has affected Wall Street. What do you think the company’s arrival means for other AI businesses who now have a new, potentially more efficient competitor?

DeepSeek’s release of high-quality open-source models challenges the closed-source leaders such as OpenAI, Google, and Anthropic. Customers that rely on such closed-source models now have a new option of an open-source and more cost-effective solution. This can change the AI development and competition landscape and business models.  

Specifically, since DeepSeek allows businesses or AI researchers to access its models without paying much API fees, it could drive down the prices of AI services, potentially forcing the closed-source AI companies to reduce cost or provide other more advanced features to keep customers. 

We have seen the release of DeepSeek-R1 model has caused a dip in the stock prices of GPU companies because people realized that the previous assumption that large AI models would require many costly GPUs to train for a long time may not be true anymore. In the future, AI companies or startups may focus on smarter and more efficient algorithms and architectures that reduce dependencies on high-end GPUs, leading to better cost and energy efficiency. 

Finally, given the on-going geopolitical competition between China and the Western countries, DeepSeek’s rapid progress can trigger more scrutiny and tighter export controls from Western countries, and concerns over privacy and national security issues will continue to evolve and will shape future policies and regulations on AI development and usage. 


Share this story

This story was published February 6, 2025.