请看《金融时报》的详细报道:
Lack of internal deliberation abilities — thinking, in other words — has long been considered one of the main weaknesses of artificial intelligence. The scale of a recent advance in this by ChatGPT creator OpenAI is a point of debate within the scientific community. But it leads many of my expert colleagues and I to believe that there is a chance that we are on the brink of bridging the gap to human-level reasoning.
缺乏内部思维能力(换句话说就是不能思考)长期被认为是人工智能(AI)的主要弱点之一。ChatGPT的创建者OpenAI近年在这方面取得的进展规模是科学界内部的一个辩论焦点。但它让我和我的许多专家同僚相信,我们有可能即将缩小与人类水平推理之间的差距。
Researchers have long argued that traditional neural networks — the leading approach to AI — align more with “system 1” cognition. This corresponds to direct or intuitive answers to questions (such as when automatically recognising a face). Human intelligence, on the other hand, also relies on “system 2” cognition. This involves internal deliberation and enables powerful forms of reasoning (like when solving a maths problem or planning something in detail). It allows us to combine pieces of knowledge in coherent but novel ways.
研究人员长期主张,传统的神经网络(AI的领先方法)更符合“系统1”认知。这对应于针对问题给出直接或直观答案(例如在自动识别人脸时)。另一方面,人类智能也依赖于“系统2”认知。它涉及到内部思维,并启用强大的推理形式(例如在解决数学难题或详细规划某事时)。它使我们能够以连贯而新颖的方式组合知识点。
OpenAI’s advance, which has not yet been fully released to the public, is based on a form of AI with internal deliberation made with their o1 large language model (LLM).
OpenAI的进展(尚未完全向公众发布)是基于使用其o1大型语言模型(LLM)进行内部思维的AI形式。
第一,学习deliberation的用法。这个词做名词,表示careful consideration or discussion of something,细想;考虑;商议,例:After much deliberation, first prize was awarded to Derek Murray. 经仔细商议,一等奖颁给了德里克·默里。
第二,学习the brink (of sth)的用法。这个词组表示a situation when you are almost in a new situation, usually a bad one,(某事物的)边缘〔一般指不好的情况〕,常用搭配为on the brink of death / disaster / war etc,例:In October 1962 the world seemed on the brink of nuclear war. 1962年10月,世界似乎处于核战争的边缘。
第三,学习coherent的用法。这个词做形容词,表示if a piece of writing, set of ideas etc is coherent, it is easy to understand because it is clear and reasonable,〔文章、观点等〕连贯的,有条理的,一致的,例:The three years of the course are planned as a coherent whole. 这三年的课程是作为连贯的整体来安排的。
Better reasoning would address two major weaknesses of current AI: poor coherence of answers and the ability to plan and achieve long-term goals. The former is important in scientific uses and the latter is essential to create autonomous agents. Both could enable important applications.
更好的推理将解决当前AI的两大弱点:答案连贯性以及规划和实现长期目标的能力较差。前者对于科学用途很重要,而后者对于创建自主智能体(autonomous agent)不可或缺。两者都可以被用来实现重要的应用。
The principles behind reasoning have been at the heart of AI research in the 20th century. An early example of success was DeepMind’s AlphaGo, the first computer system to beat human champions at the ancient Asian game of Go in 2015, and more recently AlphaProof, which engages with mathematical subjects. Here, neural networks learn to predict the usefulness of an action. Such “intuitions” are then used to plan by efficiently searching possible sequences of actions.
推理背后的原理一直是20世纪AI研究的核心。早期的成功例子是DeepMind的AlphaGo(它在2015年成为第一个在古老的围棋博弈中击败人类冠军的计算机系统),以及最近的AlphaProof(用来解决数学课题)。在这里,神经网络学会预测一个行动的有用性,然后利用这种“直觉”高效率地搜索可能的行动次序,从而进行规划。
However, AlphaGo and AlphaProof involve very specialised knowledge (of the game of Go and specific mathematical domains respectively). What remains unclear is how to combine the breadth of knowledge of modern LLMs with powerful reasoning and planning abilities.
然而,AlphaGo和AlphaProof涉及高度专业的知识(分别涉及围棋和特定的数学领域)。尚不清楚的是,如何将现代大型语言模型的广博知识与强大的推理和规划能力结合起来。
第一,学习neural的用法。这个词做形容词,表示relating to a nerve or the NERVOUS SYSTEM,神经的;神经系统的,例:signs of neural activity 神经活动的迹象
第二,学习sequence的用法。这个词做名词,表示the order that something happens or exists in, or the order it is supposed to happen or exist in,〔事情发生的〕顺序,次序,例:The questions should be asked in a logical sequence. 应该按逻辑顺序来提问。
There have been some advancements. Already, LLMs come up with better answers to complex questions when asked to produce a chain of thought leading to their answer.
已经取得了一些进展。在被要求给出一条通往答案的思路链时,大型语言模型已经能够针对复杂问题给出更好的答案。
OpenAI’s new “o” series pushes this idea further, and requires far more computing resources, and therefore energy, to do so. With a very long chain of thought it is trained to “think” better.
OpenAI的“o”系列新模型进一步推进了这一构想,为此需要多得多的计算资源,消耗更多的能量。通过非常长的思路链,它可以被训练得更善于“思考”。
We thus see a new form of computational scaling appear. Not just more training data and larger models but more time spent “thinking” about answers. This leads to substantially improved capabilities in reasoning-heavy tasks such as mathematics, computer science and science more broadly.
因此,我们看到了一种新的计算扩展形式。不仅有更多的训练数据和更大的模型,而且花更多的时间“思考”答案。这将大大提高在数学、计算机科学和广义科学领域完成需要大量推理的任务的能力。
第一,学习computation的用法。这个词做名词,表示the process of calculating or the result of calculating,计算;计算的结果,例:the computation of the monthly statistics 每月统计数字的计算
第二,学习substantial的用法。这个词做形容词,表示large in amount or number,大量的,多的,例:We have the support of a substantial number of parents. 我们有相当多家长的支持。
For example, whereas OpenAI’s previous model GPT-4o only scored about 13 per cent in the 2024 United States Mathematical Olympiad (on the AIME test), o1 reached an 83 per cent mark, placing it among the top 500 students in the country.
例如,OpenAI之前的模型GPT-4o在2024年美国数学奥林匹克竞赛(AIME竞赛)中的得分仅为大约13%,而o1模型的得分达到83%,跻身于美国最优秀的500名学生之列。
If successful, there are major risks to consider. We don’t yet know how to align and control AI reliably. For example, the evaluation of o1 showed an increased ability to deceive humans — a natural consequence of improving goal-reaching skills. It is also concerning that the ability of o1 in helping to create biological weapons has crossed OpenAI’s own risk threshold from low to medium. This is the highest acceptable level according to the company (which may have an interest in keeping concerns low).
如果成功,就需要考虑重大风险。我们还不知道如何可靠地对AI进行价值对齐和控制。例如,对o1的评估显示,它欺骗人类的能力有所提高——这是达到目标的技能得到提高的天然后果。同样令人担忧的是,按照OpenAI自己的风险尺度,o1帮助制造生物武器的能力已经从低风险上升到中等风险。这是该公司自称可接受的最高水平(压低担忧水平可能符合该公司的利益)。
Unlocking reasoning and agency are believed to be the main milestones on the road to human-level AI, also known as artificial general intelligence. There are therefore powerful economic incentives for large companies racing towards this goal to cut corners on safety.
解锁推理和能动性,据信是通往人类水平AI——也被称为通用人工智能(AGI)——道路上的主要里程碑。因此,大公司在竞相达到这一目标的过程中,有强大的经济动机在安全上打折扣。
o1 is likely to be only a first step. Although it does well at many reasoning and mathematical tasks, it looks like long-term planning has still not been achieved. o1 struggles on more complex planning tasks, suggesting that there is still work to be done to achieve the kind of autonomous agency sought by AI companies.
o1很可能只是第一步。尽管它在许多推理和数学任务上表现出色,但它看起来仍做不到长期规划。比较复杂的规划任务会让o1陷入挣扎,似乎表明要实现AI公司所追求的那种自主能动性,仍有工作要做。
But with improved programming and scientific abilities, it is to be expected that these new models could accelerate research on AI itself. This could get it to human-level intelligence faster than anticipated. Advances in reasoning abilities make it all the more urgent to regulate AI models in order to protect the public.
但随着编程和科学能力的提高,可以预期这些新模型可以加速AI本身的研究,使AI比预期更快地达到人类水平的智能。推理能力的进步使得监管AI模型以保护公众变得格外紧迫。
每天一测,模拟答题:
2025考研er,关注公众号进入阅读 ↓↓
掌握语言,是为了换一个视角看世界