作者:吴恩达亲爱的朋友们过去一周关于 DeepSeek 的热议让许多人明确了几个显而易见的趋势:(一)中国在生成式人工智能领域正在赶超美国,这对人工智能供应链产生了影响。(二)开放权重模型正在使基础模型层商品化,这为应用程序开发者创造了机会。(三)扩大规模并非推动人工智能进步的唯一途径。尽管对处理能力的巨大关注和炒作不断,算法创新正在迅速降低训练成本。大约一周前,DeepSeek 发布了 DeepSeek-R1,这是一款性能对标 OpenAI o1 的杰出模型,且以 MIT 许可协议开放权重。上周在达沃斯,许多非技术背景的商业领袖向我提出了相关问题。周一,股市甚至出现了“DeepSeek 抛售潮”:英伟达等多家美国科技公司股价暴跌(截至本文撰写时,股价已部分回升)。我认为 DeepSeek 让许多人意识到以下关键点:中国在生成式 AI 领域正赶超美国。2022 年 11 月 ChatGPT 发布时,美国在生成式 AI 领域显著领先中国。但人们的认知往往滞后——直到最近,我仍听到中美两国的朋友认为中国处于落后地位。而现实是,这一差距在过去两年已迅速缩小。通过 Qwen(我的团队已使用数月)、Kimi、InternVL 和 DeepSeek 等来自中国的模型,中国显然正在缩小差距;在视频生成等领域,中国甚至已显现领先势头。令我振奋的是,DeepSeek-R1 以开放权重形式发布,其技术报告也披露了大量细节。相比之下,一些美国公司通过渲染“AI 灭绝人类”等假设性风险推动监管,试图扼杀开源。如今,开源/开放权重模型已成为 AI 供应链的关键环节:许多公司将使用它们。若美国持续压制开源,中国将主导这一供应链环节,导致更多企业使用反映中国价值观而非美国价值观的模型。开放权重模型正在将基础模型层商品化。正如我之前所写,LLM(大语言模型)的 token 价格正快速下降,而开放权重加速了这一趋势并为开发者提供了更多选择。OpenAI 的 o1 模型每百万输出 token 成本为 60 美元,而 DeepSeek R1 仅需 2.19 美元。这一近 30 倍的价差让更多人注意到价格下降的趋势。训练基础模型并出售 API 访问权的商业模式充满挑战。该领域的许多公司仍在寻找回收高昂训练成本的路径。文章《AI 的6000亿美元难题》对此挑战剖析深刻(需说明的是,我认为基础模型公司的工作非常出色,并希望他们成功)。相比之下,基于基础模型构建应用程序蕴含着巨大的商业机会——既然他人已投入数十亿美元训练模型,你只需花费几美元即可调用这些模型,开发客服聊天机器人、邮件摘要工具、AI医生、法律文档助手等应用。扩大规模并非 AI 进步的唯一路径。围绕“扩大模型规模推动进步”的炒作甚嚣尘上。公允地说,我曾是模型规模化的早期支持者。一些公司通过鼓吹“资本越多→规模越大→性能越可预测”的叙事筹集了数十亿美元。因此,人们过度关注规模化,却忽视了对多样化进步途径的客观讨论。受美国 AI 芯片禁令影响,DeepSeek 团队不得不在性能较低的 H800 GPU(而非 H100)上进行多项优化创新,最终以低于 600 万美元的计算成本完成了模型训练(研发成本不计)。目前尚不确定这是否会真正降低对算力的需求。有时,降低某种商品的单价反而会导致人们在购买该商品时花费更多的总金额。但我认为,长期来看,人类对智能和算力的需求几乎没有上限。因此,我仍坚信,即使成本下降,人类对智能的使用量将持续增长。在社交媒体上,我看到人们对 DeepSeek 的进展有多种解读,仿佛它是罗夏墨迹测试,任人投射主观意义。我认为 DeepSeek-R1 的地缘政治影响尚未完全显现,但它无疑为 AI 应用开发者带来了福音。我的团队已开始头脑风暴——正是因为能轻松获取开放的先进推理模型,许多新创意才成为可能。现在仍是构建未来的绝佳时机!持续学习Andrew
英文原文:
Dear friends,The buzz over DeepSeek this week crystallized, for many people, a few important trends that have been happening in plain sight: (i) China is catching up to the U.S. in generative AI, with implications for the AI supply chain. (ii) Open weight models are commoditizing the foundation-model layer, which creates opportunities for application builders. (iii) Scaling up isn’t the only path to AI progress. Despite the massive focus on and hype around processing power, algorithmic innovations are rapidly pushing down training costs.About a week ago, DeepSeek, a company based in China, released DeepSeek-R1, a remarkable model whose performance on benchmarks is comparable to OpenAI’s o1. Further, it was released as an open weight model with a permissive MIT license. At Davos last week, I got a lot of questions about it from non-technical business leaders. And on Monday, the stock market saw a “DeepSeek selloff”: The share prices of Nvidia and a number of other U.S. tech companies plunged. (As of the time of writing, they have recovered somewhat.)Here’s what I think DeepSeek has caused many people to realize:China is catching up to the U.S. in generative AI. When ChatGPT was launched in November 2022, the U.S. was significantly ahead of China in generative AI. Impressions change slowly, and so even recently I heard friends in both the U.S. and China say they thought China was behind. But in reality, this gap has rapidly eroded over the past two years. With models from China such as Qwen (which my teams have used for months), Kimi, InternVL, and DeepSeek, China had clearly been closing the gap, and in areas such as video generation there were already moments where China seemed to be in the lead.I’m thrilled that DeepSeek-R1 was released as an open weight model, with a technical report that shares many details. In contrast, a number of U.S. companies have pushed for regulation to stifle open source by hyping up hypothetical AI dangers such as human extinction. It is now clear that open source/open weight models are a key part of the AI supply chain: Many companies will use them. If the U.S. continues to stymie open source, China will come to dominate this part of the supply chain and many businesses will end up using models that reflect China’s values much more than America’s.Open weight models are commoditizing the foundation-model layer. As I wrote previously, LLM token prices have been falling rapidly, and open weights have contributed to this trend and given developers more choice. OpenAI’s o1 costs $60 per million output tokens; DeepSeek R1 costs $2.19. This nearly 30x difference brought the trend of falling prices to the attention of many people.The business of training foundation models and selling API access is tough. Many companies in this area are still looking for a path to recouping the massive cost of model training. The article “AI’s $600B Question” lays out the challenge well (but, to be clear, I think the foundation model companies are doing great work, and I hope they succeed). In contrast, building applications on top of foundation models presents many great business opportunities. Now that others have spent billions training such models, you can access these models for mere dollars to build customer service chatbots, email summarizers, AI doctors, legal document assistants, and much more.Scaling up isn’t the only path to AI progress. There’s been a lot of hype around scaling up models as a way to drive progress. To be fair, I was an early proponent of scaling up models. A number of companies raised billions of dollars by generating buzz around the narrative that, with more capital, they could (i) scale up and (ii) predictably drive improvements. Consequently, there has been a huge focus on scaling up, as opposed to a more nuanced view that gives due attention to the many different ways we can make progress. Driven in part by the U.S. AI chip embargo, the DeepSeek team had to innovate on many optimizations to run on less-capable H800 GPUs rather than H100s, leading ultimately to a model trained (omitting research costs) for under $6M of compute.It remains to be seen if this will actually reduce demand for compute. Sometimes making each unit of a good cheaper can result in more dollars in total going to buy that good. I think the demand for intelligence and compute has practically no ceiling over the long term, so I remain bullish that humanity will use more intelligence even as it gets cheaper.I saw many different interpretations of DeepSeek’s progress on social media, as if it was a Rorschach test that allowed many people to project their own meaning onto it. I think DeepSeek-R1 has geopolitical implications that are yet to be worked out. And it’s also great for AI application builders. My team has already been brainstorming ideas that are newly possible only because we have easy access to an open advanced reasoning model. This continues to be a great time to build!Keep learning,Andrew