苦涩的教训：AI进化的关键，不是"人类的经验"

文摘 2024-12-16 19:14 北京

算力就是生产力，但我们总是高估经验，低估算力。七十年的人工智能研究告诉我们，能把算力用到极致的通用方法，最终总会展现出摧枯拉朽的优势。

在2024年12月的NeurIPS会议上，OpenAI前联合创始人兼首席科学家Ilya Sutskever做出了一个引人深思的判断：大模型的预训练方式即将迎来终点。"我们已经达到了数据的峰值，不会再有更多了"，他说，"就像化石燃料一样，互联网上的人类内容是有限的。"

Ilya 的判断在AI领域投下了一颗重磅炸弹。当前的AI发展路径，主要依赖于用海量数据进行预训练。如今，这条路似乎正在触及天花板。Sutskever指出，未来的AI系统将不得不转向新的方向：它们必须能够从有限的数据中理解事物，必须具备真正的推理能力，而不是简单地基于已见过的模式进行匹配。

Ilya 的相关思考也可以追溯到 Rich Sutton（强化学习领域的奠基人之一）在2019年发表的经典短文《The Bitter Lesson》（苦涩的教训）。 在这个充满不确定性的转折点上，重读这篇文章格外具有意义，江树20年的时候读过一次，那时候还没什么感觉，如今重读感受很深，几乎预言了AI的发展，在这里分享给大家。

这篇文章回顾了AI发展70年的历史，揭示了一个始终存在却常常被我们忽视的真相：在人工智能的进化道路上，人类经验的价值，也许远没有我们想象的那么重要。

文章的几个核心观点：

关于算力与人类知识
"算力就是生产力，但我们总是高估经验，低估算力。七十年的人工智能研究告诉我们，能把算力用到极致的通用方法，最终总会展现出摧枯拉朽的优势。"
关于研究者的误区
"研究者常常患有'经验偏执症'：为了在短期内取得进展，热衷于将自己对特定领域的认知灌输给AI。这种做法像是给AI戴上了人类经验的枷锁，看似是锦上添花，实则是画蛇添足。"
关于突破与创新
"真正的突破往往来自于放手。让AI自己去学习，而不是替它学习；让AI自己去发现，而不是把我们的发现塞给它。"
关于发展方向
"与其执着于让AI像人类一样思考，不如专注于让AI具备像人类一样学习的能力。我们需要的不是会重复人类已有知识的AI，而是能够自主探索和学习的AI。"
关于技术路线
"在人工智能的进化道路上，搜索和学习是两翼：一翼承载着探索的能力，一翼承载着进化的潜力。这两种方法都能随着算力的提升无限扩展。"
关于认知的复杂性
"人类对世界的认知就像一张无限延展的网，我们不应天真地试图用简单的规则去概括这种复杂性。相反，我们应该构建能够理解和适应这种复杂性的方法。"

在人工智能研究中，我们需要的或许不是更多的专家经验，而是更多的开放思维。让AI真正获得自主学习和探索的能力，才是通向人工智能未来的康庄大道。

我们总是不自觉地用人类的思维方式去束缚AI的发展潜力，却忽视了一个基本事实：正如儿童不需要被灌输所有人类知识就能成长，AI系统同样不需要被植入所有人类经验就能进化。

或许在人工智能研究中，我们最大的敌人往往不是技术难题，而是我们自己的认知偏见。

下面是原文翻译：

The Bitter Lesson 苦涩的教训

作者：Rich Sutton (强化学习领域奠基人之一)

时间：2019年3月13日

正文：

The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation.

70年的人工智能研究给我们上的最大一课是：能充分利用计算能力的通用方法最终会变得最有效，而且优势会相当明显。究其根本，这要归功于摩尔定律，或者更准确地说，是计算单位成本持续指数级下降这一普遍规律。

Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available.

大多数AI研究都假设AI系统能使用的计算资源是恒定的（在这种情况下，利用人类知识确实是提升性能的少数途径之一）。但是，只要稍微拉长一点时间，比一个典型研究项目周期长那么一点，就会发现可用的计算能力必然会大幅增加。

Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation.

研究人员为了在短期内取得进展，往往会利用他们对特定领域的专业知识，但从长远来看，真正起决定性作用的是对计算能力的利用。这两种方法理论上并不冲突，但实际操作中往往会产生矛盾。投入到一种方法上的时间就意味着失去了投入另一种方法的机会，而且研究人员也会在心理上对自己选择的方法产生依赖。更重要的是，基于人类知识的方法往往会让系统变得更复杂，反而不利于利用计算力的通用方法发挥作用。

In computer chess, the methods that defeated the world champion, Kasparov, in 1997, were based on massive, deep search. At the time, this was looked upon with dismay by the majority of computer-chess researchers who had pursued methods that leveraged human understanding of the special structure of chess.

在计算机象棋领域，1997年击败世界冠军卡斯帕罗夫的方法就是建立在大规模深度搜索的基础上。当时，这个结果令大多数专注于利用人类对象棋特殊结构理解的计算机象棋研究者感到沮丧。

When a simpler, search-based approach with special hardware and software proved vastly more effective, these human-knowledge-based chess researchers were not good losers. They said that "brute force" search may have won this time, but it was not a general strategy, and anyway it was not how people played chess. These researchers wanted methods based on human input to win and were disappointed when they did not.

当一个更简单的、基于搜索的方法配合专门的硬件和软件证明效果更好时，这些主张运用人类知识的研究者输得并不甘心。他们说"暴力"搜索这次可能赢了，但这并不是一个通用的策略，而且这也不是人类下棋的方式。这些研究者希望基于人类输入的方法能够获胜，当结果与预期相反时，他们感到很失望。

A similar pattern of research progress was seen in computer Go, only delayed by a further 20 years. Enormous initial efforts went into avoiding search by taking advantage of human knowledge, or of the special features of the game, but all those efforts proved irrelevant, or worse, once search was applied effectively at scale.

类似的研究进展模式在计算机围棋领域也出现了，只是晚了20年。早期投入了大量精力试图通过利用人类知识或游戏的特殊性质来避免搜索，但一旦大规模搜索方法被有效运用，这些努力都证明是无关紧要的，甚至起到了反作用。

Also important was the use of learning by self play to learn a value function (as it was in many other games and even in chess, although learning did not play a big role in the 1997 program that first beat a world champion). Learning by self play, and learning in general, is like search in that it enables massive computation to be brought to bear.

同样重要的是自我对弈学习在价值函数学习中的应用（这在许多其他游戏甚至象棋中都有应用，尽管在1997年首次击败世界冠军的程序中学习并未发挥重要作用）。自我对弈学习，以及广义上的学习，和搜索一样，都能充分利用海量计算资源。

Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research. In computer Go, as in computer chess, researchers' initial effort was directed towards utilizing human understanding (so that less search was needed) and only much later was much greater success had by embracing search and learning.

搜索和学习是AI研究中利用大规模计算的两种最重要技术。在计算机围棋领域，和计算机象棋一样，研究者最初致力于利用人类理解（以减少所需的搜索量），但直到后来才通过全面拥抱搜索和学习取得了更大的成功。

In speech recognition, there was an early competition, sponsored by DARPA, in the 1970s. Entrants included a host of special methods that took advantage of human knowledge---knowledge of words, of phonemes, of the human vocal tract, etc. On the other side were newer methods that were more statistical in nature and did much more computation, based on hidden Markov models (HMMs).

在语音识别领域，DARPA在20世纪70年代举办了一场早期竞赛。参赛者包括许多利用人类知识的特殊方法——关于词汇、音素、人类声道等知识。另一方面是一些更具统计性质、计算量更大的新方法，这些方法基于隐马尔可夫模型（HMMs）。

Again, the statistical methods won out over the human-knowledge-based methods. This led to a major change in all of natural language processing, gradually over decades, where statistics and computation came to dominate the field. The recent rise of deep learning in speech recognition is the most recent step in this consistent direction.

结果再次证明，统计方法战胜了基于人类知识的方法。这引发了自然语言处理领域的重大变革，在随后的几十年里，统计和计算逐渐主导了这个领域。近年来深度学习在语音识别中的崛起是这一持续趋势的最新进展。

Deep learning methods rely even less on human knowledge, and use even more computation, together with learning on huge training sets, to produce dramatically better speech recognition systems. As in the games, researchers always tried to make systems that worked the way they thought their own minds worked---they tried to put that knowledge in their systems---but it proved ultimately counterproductive, and a colossal waste of researcher's time, when, through Moore's law, massive computation became available and a means was found to put it to good use.

深度学习方法更少依赖人类知识，使用更多的计算资源，结合海量训练数据的学习，创造出了性能显著提升的语音识别系统。就像在游戏领域一样，研究者总是试图让系统按照他们认为自己大脑运作的方式工作——试图将这些知识植入系统中——但当摩尔定律带来的海量计算能力变得可用，且找到了利用这些计算能力的方法时，这种做法最终被证明是适得其反的，是对研究者时间的巨大浪费。

In computer vision, there has been a similar pattern. Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of SIFT features. But today all this is discarded. Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better.

计算机视觉领域也出现了类似的模式。早期方法将视觉理解为边缘检测、广义圆柱体，或是基于SIFT特征。但如今这些都已被抛弃。现代深度学习神经网络仅使用卷积和某些不变性的概念，却取得了更好的效果。

This is a big lesson. As a field, we still have not thoroughly learned it, as we are continuing to make the same kind of mistakes. To see this, and to effectively resist it, we have to understand the appeal of these mistakes.

这是一个重要的教训。作为一个领域，我们仍然没有完全吸取这个教训，因为我们还在不断重复同样的错误。要认识到这一点并有效地抵制这种倾向，我们必须理解这些错误为什么具有吸引力。

The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning.

这个苦涩的教训基于以下历史观察：1) AI研究者经常试图将知识构建到他们的系统中，2) 这在短期内总是有帮助，而且能给研究者带来个人满足感，3) 从长远来看它会遇到瓶颈，甚至阻碍进一步发展，4) 突破性进展最终来自于一种基于扩展计算能力的搜索和学习的对立方法。

The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.

最终的成功往往带着苦涩的味道，而且常常难以完全消化，因为这意味着战胜了一种备受推崇的、以人类为中心的方法。

One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.

从这个苦涩的教训中应该学到的一点是通用方法的强大力量，这些方法即使在可用计算资源变得非常庞大时，仍能随着计算能力的增加而持续扩展。看起来能够这样无限扩展的两种方法是搜索和学习。

The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries.

从这个苦涩的教训中学到的第二个普遍性观点是，心智的实际内容是极其复杂的，而且无法简化；我们应该停止尝试寻找理解心智内容的简单方法，比如理解空间、物体、多智能体或对称性的简单方法。

All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us.

所有这些都是任意的、本质上复杂的外部世界的一部分。它们的复杂性是无穷的，不应该被直接构建到系统中；相反，我们应该只构建能够发现和捕捉这种任意复杂性的元方法。这些方法的关键在于它们能找到好的近似解，但寻找这些解的过程应该由我们的方法完成，而不是由我们人类来完成。

We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.

我们需要能够像我们一样进行探索的AI系统，而不是包含我们已有发现的系统。将我们的发现植入系统只会让我们更难理解探索过程是如何实现的。

云中江树

AI 提示词怎么应用？关于提示词你需要知道的一切。结构化提示词的提出者，wx 1796060717

最新文章

独家 | 如何用 AI 辅导中考英语？

OpenAI 铺垫了12天发布的 o3 到底咋样？

两小时做了个垃圾，居然两天内就上架了Chrome官方Store

超越GPT-4！Kimi新模型k1完胜数理化，还能像人类一样思考

苦涩的教训：AI进化的关键，不是"人类的经验"

10个让论文写作提效300%的ChatGPT提示词

真香！斯坦福大学开源基金申请AI提示词

Claude x 云中江树 | 这话一股子AI味，应该不是人写的吧

Sora王者归来，但你并不需要一个嫌弃你的王者

Meta重磅发布Llama 3.3，为什么不敢对比这个中国模型？

后记 | Qwen：终究是我错付了

OpenAI 画饼会第二弹 | 奥特曼老爷创造了世间第三种悲剧

Claude 西游小说 | AI山大王画饼记

Claude 三国版小说 | AI界三分志·奥特曼画饼记

Claude 水浒版小说 | 智取AI山

OpenAI 发布会：我发现了延长时间的秘诀

Claude 小说 | 我终于写出了那串完美提示词

从"能BB"到"能干活"：我在智谱 Agent OpenDay 现场，见证了大模型开始"卷应用"

Claude Prompt | 缺钱，穷疯了

从文案到设计，我用通义版Artifacts生成了365张灵感日历

从乡下放牛娃到大厂AI顾问：我用本硕七年，重写自己的人生

和AI时代一起成长，我们办了一场 PEC 大会

深夜刷到OpenAI o1的演讲视频，我靠百度网盘...

口袋AI：五分钟把世界知识装进口袋

Javis真的来了？智谱AutoGLM的Phone Use时刻！

系统论述：大模型绘画指南，开启AI生产力绘图新时代！

免费用即梦AI 制作超可爱表情包（附Prompt)

独家: 如何一句话在Claude里直接打游戏（进阶教程）

最新，Claude 官方系统提示词来了

Claude 重磅更新 | AI 迈出了接管物理世界的第一步

免费 | 万人共学的书生大模型实战营公益课程来啦！

零零后的 AI 逆袭之路，收入暴涨10倍+

李继刚 | Prompt（提示词）的道和术

AI 武装大脑，不做股市新韭！

「Kimi探索版」惊艳进化：AI学会了深度慢思考？

决策力爆棚！卡尼曼教授的高质决策法则大公开（附Prompt）

释放AI能力 | 你的专属古诗词卡片（附 Prompt）

AGI 来了100次，提示工程死了10次

不会写代码，如何通过AI编程颠覆工作方式？

Claude帮你来做「情绪价值营销」（附prompt）

智能对决：提示词攻防中的AI安全博弈

阿里通义万相AI视频生成尝鲜体验！

Claude Artifacts 数据分析的N种打开方式（附Prompt)

10分钟打造小红书？| 42个Cursor神级提示词（全网最新最全）

豆包、kimi 这些大模型系统提示词里写了啥？(一)

现在，国内 AI 也能直出高颜值名片

Claude一键直出 AI 简历，来啦！

GPT-5 登场？| 我用提示词让草莓直出吸睛简历！

国产AI也能玩转"汉语新解"？我用通义AI直出爆款文字卡片

中秋带全家老少出游有多难？要不是文小言，我差点崩溃

分类

时事

民生

政务

教育

文化

科技

财富

体娱

健康

情感

旅行

百科

职场

楼市

企业

乐活

学术

汽车

时尚

创业

美食

幽默

美体

文摘

原创标签

时事社会财经军事教育体育科技汽车科学房产搞笑综艺明星音乐动漫游戏时尚健康旅游美食生活摄影宠物职场育儿情感小说曲艺文化历史三农文学娱乐电影视频图片新闻宗教电视剧纪录片广告创意壁纸头像心灵鸡汤星座命理教育培训艺术文化金融财经健康医疗美妆时尚餐饮美食母婴育儿社会新闻工业农业时事政治星座占卜幽默笑话独立短篇连载作品文化历史科技互联网

发布位置

广东北京山东江苏河南浙江山西福建河北上海四川陕西湖南安徽湖北内蒙古江西云南广西甘肃辽宁黑龙江贵州新疆重庆吉林天津海南青海宁夏西藏香港澳门台湾美国加拿大澳大利亚日本新加坡英国西班牙新西兰韩国泰国法国德国意大利缅甸菲律宾马来西亚越南荷兰柬埔寨俄罗斯巴西智利卢森堡芬兰瑞典比利时瑞士土耳其斐济挪威朝鲜尼日利亚阿根廷匈牙利爱尔兰印度老挝葡萄牙乌克兰印度尼西亚哈萨克斯坦塔吉克斯坦希腊南非蒙古奥地利肯尼亚加纳丹麦津巴布韦埃及坦桑尼亚捷克阿联酋安哥拉