ChatGPT 制造商透露了官方称为 OpenAI o1 的细节,这表明人工智能需要的不仅仅是规模才能发展。
开放人工智能去年,谷歌推出了 GPT-4,将其模型规模扩大到令人眼花缭乱的程度,这是该公司在人工智能领域取得的最新重大突破。该公司今天宣布了一项新进展,标志着方法的转变——该模型可以对许多难题进行逻辑“推理”,并且比现有的人工智能更聪明,而无需进行大规模扩展。
这个被称为 OpenAI o1 的新模型可以解决现有 AI 模型无法解决的问题,包括 OpenAI 现有最强大的模型GPT-4o。它不是像大型语言模型通常那样一步到位地得出答案,而是通过推理解决问题,像人一样有效地大声思考,然后得出正确的结果。
OpenAI 首席技术官Mira Murati向 WIRED 表示: “我们认为这就是这些模型中的新范式。它在处理非常复杂的推理任务方面表现得更好。”
精选视频
人工智能专家解答推特上的工程问题
OpenAI 内部将新模型的代号定为 Strawberry,该公司表示,它并不是 GPT-4o 的后继者,而是对它的补充。
Murati 表示,OpenAI 目前正在构建其下一个主模型 GPT-5,该模型将比其前身大得多。尽管该公司仍然相信规模将有助于从人工智能中挖掘出新的能力,但 GPT-5 很可能还会包括今天推出的推理技术。“有两种范式,”Murati 说。“扩展范式和这种新范式。我们希望将它们结合在一起。”
LLM通常从输入大量训练数据的大型神经网络中寻找答案。他们可以展现出非凡的语言和逻辑能力,但传统上却很难解决非常简单的问题,例如涉及推理的基本数学问题。
Murati 表示,OpenAI o1 使用强化学习,即当模型答对时给予正反馈,答错时给予负反馈,以改进其推理过程。“该模型会磨练思维,并微调其用于得出答案的策略,”她说。强化学习使计算机能够以超人的技巧玩游戏,并执行设计计算机芯片等有用的任务。该技术也是将 LLM 变成有用且行为良好的聊天机器人的关键因素。
OpenAI 研究副总裁 Mark Chen 向《连线》杂志展示了新模型,并用它解决了其前身模型 GPT-4o 无法解决的几个问题。其中包括一道高级化学题和以下一道令人费解的数学题:“如果公主的年龄是王子的两倍,而公主的年龄是王子现在年龄的一半,那么公主的年龄与王子的年龄相同。王子和公主的年龄是多少?”(正确答案是王子 30 岁,公主 40 岁)。
陈说:“新模式是学会独立思考,而不是像传统LLM那样试图模仿人类的思维方式。”
OpenAI 表示,其新模型在许多问题集上的表现明显更好,包括专注于编码、数学、物理、生物和化学的问题。据该公司称,在针对数学学生的美国数学邀请赛 (AIME) 中,GPT-4o 平均解决了 12% 的问题,而 o1 的正确率为 83%。
新模型比 GPT-4o 慢,OpenAI 表示它的表现并不总是更好——部分原因是,与 GPT-4o 不同,它无法搜索网络并且不是多模式的,这意味着它无法解析图像或音频。
一段时间以来,提高LLM的推理能力一直是研究界的热门话题。事实上,竞争对手也在进行类似的研究。7 月,谷歌宣布了AlphaProof,这是一个将语言模型与强化学习相结合以解决困难数学问题的项目。
AlphaProof 能够通过查看正确答案来学习如何推理数学问题。扩展这种学习方式的一个关键挑战是,模型可能遇到的所有问题都没有正确答案。陈说,OpenAI 已经成功建立了一个更通用的推理系统。“我确实认为我们在这方面取得了一些突破;我认为这是我们的优势之一,”陈说。“它实际上在所有领域的推理方面都相当出色。”
斯坦福大学教授诺亚·古德曼(Noah Goodman ) 曾发表过关于提高LLM推理能力的论文,他表示,更广泛训练的关键可能在于使用“精心提示的语言模型和手工制作的数据”进行训练。他补充说,能够始终以结果速度换取更高的准确性将是一个“不错的进步”。
麻省理工学院助理教授Yoon Kim表示,LLM如何解决问题目前仍有些神秘,即使它们进行逐步推理,也可能与人类智能存在关键差异。随着该技术得到更广泛的应用,这一点可能至关重要。“这些系统可能会做出影响很多人的决策,”他说。“更大的问题是,我们是否需要对计算模型如何做出决策充满信心?”
OpenAI 今天推出的技术也可能有助于确保 AI 模型表现良好。Murati 表示,新模型已经证明,通过推理其行为的结果,它可以更好地避免产生不愉快或潜在有害的结果。“如果你想教孩子,一旦他们能够推理出为什么他们要做某件事,他们就会更好地学会遵守某些规范、行为和价值观,”她说。
华盛顿大学名誉教授、著名人工智能专家Oren Etzioni表示,“让LLM能够参与多步骤问题解决、使用工具和解决复杂问题至关重要。”他补充道,“单纯的规模化无法实现这一点。”然而,Etzioni 表示,未来还有更多的挑战。“即使推理问题得到解决,我们仍然面临幻觉和事实性的挑战。”
OpenAI 的陈表示,该公司开发的新推理方法表明,推进人工智能并不需要耗费大量的计算能力。“这种模式令人兴奋的一点是,我们相信它将使我们能够以更低的成本交付智能,”他说,“我认为这确实是我们公司的核心使命。”
OpenAI Announces a New AI Model, Code-Named Strawberry, That Solves Difficult Problems Step by Step
OPENAI made the last big breakthrough in artificial intelligence by increasing the size of its models to dizzying proportions, when it introduced GPT-4 last year. The company today announced a new advance that signals a shift in approach—a model that can “reason” logically through many difficult problems and is significantly smarter than existing AI without a major scale-up.
The new model, dubbed OpenAI o1, can solve problems that stump existing AI models, including OpenAI’s most powerful existing model, GPT-4o. Rather than summon up an answer in one step, as a large language model normally does, it reasons through the problem, effectively thinking out loud as a person might, before arriving at the right result.
“This is what we consider the new paradigm in these models,” Mira Murati, OpenAI’s chief technology officer, tells WIRED. “It is much better at tackling very complex reasoning tasks.”
FEATURED VIDEO
AI Expert Answers Prompt Engineering Questions From Twitter
The new model was code-named Strawberry within OpenAI, and it is not a successor to GPT-4o but rather a complement to it, the company says.
Murati says that OpenAI is currently building its next master model, GPT-5, which will be considerably larger than its predecessor. But while the company still believes that scale will help wring new abilities out of AI, GPT-5 is likely to also include the reasoning technology introduced today. “There are two paradigms,” Murati says. “The scaling paradigm and this new paradigm. We expect that we will bring them together.”
LLMs typically conjure their answers from huge neural networks fed vast quantities of training data. They can exhibit remarkable linguistic and logical abilities, but traditionally struggle with surprisingly simple problems such as rudimentary math questions that involve reasoning.
Murati says OpenAI o1 uses reinforcement learning, which involves giving a model positive feedback when it gets answers right and negative feedback when it does not, in order to improve its reasoning process. “The model sharpens its thinking and fine tunes the strategies that it uses to get to the answer,” she says. Reinforcement learning has enabled computers to play games with superhuman skill and do useful tasks like designing computer chips. The technique is also a key ingredient for turning an LLM into a useful and well-behaved chatbot.
Mark Chen, vice president of research at OpenAI, demonstrated the new model to WIRED, using it to solve several problems that its prior model, GPT-4o, cannot. These included an advanced chemistry question and the following mind-bending mathematical puzzle: “A princess is as old as the prince will be when the princess is twice as old as the prince was when the princess’s age was half the sum of their present age. What is the age of the prince and princess?” (The correct answer is that the prince is 30, and the princess is 40).
“The [new] model is learning to think for itself, rather than kind of trying to imitate the way humans would think,” as a conventional LLM does, Chen says.
OpenAI says its new model performs markedly better on a number of problem sets, including ones focused on coding, math, physics, biology, and chemistry. On the American Invitational Mathematics Examination (AIME), a test for math students, GPT-4o solved on average 12 percent of the problems while o1 got 83 percent right, according to the company.
The new model is slower than GPT-4o, and OpenAI says it does not always perform better—in part because, unlike GPT-4o, it cannot search the web and it is not multimodal, meaning it cannot parse images or audio.
Improving the reasoning capabilities of LLMs has been a hot topic in research circles for some time. Indeed, rivals are pursuing similar research lines. In July, Google announced AlphaProof, a project that combines language models with reinforcement learning for solving difficult math problems.
AlphaProof was able to learn how to reason over math problems by looking at correct answers. A key challenge with broadening this kind of learning is that there are not correct answers for everything a model might encounter. Chen says OpenAI has succeeded in building a reasoning system that is much more general. “I do think we have made some breakthroughs there; I think it is part of our edge,” Chen says. “It’s actually fairly good at reasoning across all domains.”
Noah Goodman, a professor at Stanford who has published work on improving the reasoning abilities of LLMs, says the key to more generalized training may involve using a “carefully prompted language model and handcrafted data” for training. He adds that being able to consistently trade the speed of results for greater accuracy would be a “nice advance.”
Yoon Kim, an assistant professor at MIT, says how LLMs solve problems currently remains somewhat mysterious, and even if they perform step-by-step reasoning there may be key differences from human intelligence. This could be crucial as the technology becomes more widely used. “These are systems that would be potentially making decisions that affect many, many people,” he says. “The larger question is, do we need to be confident about how a computational model is arriving at the decisions?”
The technique introduced by OpenAI today also may help ensure that AI models behave well. Murati says the new model has shown itself to be better at avoiding producing unpleasant or potentially harmful output by reasoning about the outcome of its actions. “If you think about teaching children, they learn much better to align to certain norms, behaviors, and values once they can reason about why they’re doing a certain thing,” she says.
Oren Etzioni, a professor emeritus at the University of Washington and a prominent AI expert, says it’s “essential to enable LLMs to engage in multi-step problem solving, use tools, and solve complex problems.” He adds, “Pure scale up will not deliver this.” Etzioni says, however, that there are further challenges ahead. “Even if reasoning were solved, we would still have the challenge of hallucination and factuality.”
OpenAI’s Chen says that the new reasoning approach developed by the company shows that advancing AI need not cost ungodly amounts of compute power. “One of the exciting things about the paradigm is we believe that it’ll allow us to ship intelligence cheaper,” he says, “and I think that really is the core mission of our company.”