人工智能正在学会思考 | 金融时报

时事   2024-11-22 07:58   山东  
缺乏内部思维能力(换句话说就是不能思考)长期被认为是人工智能(AI)的主要弱点之一。ChatGPT的创建者OpenAI近年在这方面取得的进展规模是科学界内部的一个辩论焦点。但它让我和我的许多专家同僚相信,我们有可能即将缩小与人类水平推理之间的差距。
研究人员长期主张,传统的神经网络(AI的领先方法)更符合“系统1”认知。这对应于针对问题给出直接或直观答案(例如在自动识别人脸时)。另一方面,人类智能也依赖于“系统2”认知。它涉及到内部思维,并启用强大的推理形式(例如在解决数学难题或详细规划某事时)。它使我们能够以连贯而新颖的方式组合知识点。



请看《金融时报》的详细报道:




1. internal [ɪnˈtɜːnl] adj. 内部的,体内的;内政的,国内的;本身的,本质的;内心的;(大学生)本校生的 n. 内部部件,内部特征;内脏

2. scale [skeɪl] n. 天平,磅秤;天平盘;天秤(星)座(the scales);等级,级别;刻度,标度;标尺,刻度尺;规模,范围;比例,比例尺;鳞,鳞片;鱼鳞状物;水垢;牙垢;音阶;(加热后金属上形成的)氧化皮;进位制;(摄影)影调范围 v. 改变(文字、图片)的尺寸大小;刮去(鱼鳞);攀登,翻越;剔除(牙垢);(尤指皮肤)生成鳞屑;(成片状)脱落,剥落;(数量,财产) 按一定比例调节;称得重量为;估算(树木的)产木材量 adj. (模型或复制品)按比例缩小的 【名】 (scale)(意)斯卡莱(人名);scales;scales;scaling;scaled;scaled

3. involve [ɪnˈvɒlv] v. 牵涉,涉及;包含,需要;使陷入,使卷入;(使)参加,加入;使承担,使面对;involves;involving;involved;involved

4. enable [ɪˈneɪbl] v. 使能够,使可能;激活,启动;准许,授权;enables;enabling;enabled;enabled

5. maths [mæθs] n. 数学(等于 mathematics) n. (maths)人名;(瑞典)马茨

6. coherent [kəʊˈhɪərənt] adj. 有条理的,连贯的;说话条理清晰的,易于理解的;团结一致的,凝聚的;(波)相干的,相参的;黏着的,黏连的

7. release [rɪˈlɪːs] v. 释放,放走;使出院;把(动物)放生;解救;放开;拉开,松开(装置);发射,投(弹);公布,发布;发行,上映;表达,发泄;释放,排放(物质);解除(职务或工作),解雇;放弃,让予;使不紧张;开放,解禁 n. (人或动物的)释放,放出;排放,泄漏;(设备的)松开,拉开;释放装置(如按、拉的装置);表达,宣泄;解脱,轻松感;放松的机会;新发行的东西;公开,发布;公映;(责任等的)免除,解除;免责书;放弃,让予;放弃文书;传球;releases;releases;releasing;released;released

8. consideration [kənsɪdəˈreɪʃən] n. 考虑,斟酌;考虑因素;体贴,关心;报酬,酬金;(合同协议中的)对价,约因;重要(性);considerations

9. writing [ˈraɪtɪŋ] n. 写作,著书;作品,著作;写,书写;字迹;(书写或印刷的)文字;(某作家或专题的)着作,作品(writings) v. 写字;写作;写信(write的现在分词);writings

10. current [ˈkʌrənt] adj. 现行的,当前的;通用的,流行的;最近的 n. 水流,气流;电流;思潮,趋势 【名】 (current)(英)柯伦特(人名);currents

11. essential [ɪˈsenʃəl] adj. 必不可少的,非常重要的;基本的,精髓的;(氨基酸、脂肪酸)必需的;(疾病)自发的,原发的 n. 必不可少的东西,必需品;要素,本质;(某学科的)基础,基本知识;essentials

12. champion [ˈtʃæmpjən] n. 冠军,第一名;拥护者,斗士 v. 捍卫,拥护,支持 adj. 极好的;冠军的,得一等奖的 【名】 (champion)(英)钱皮恩,(法)尚皮翁(人名);champions;champions;championing;championed;championed

13. mathematical [ˌmæθɪˈmætɪkəl] adj. 有关数学的;具有数学头脑的;精确的,完整的;可能性极小的

14. intuition [ˌɪntjə(ː)ˈɪʃən] n. 直觉力;直觉感知,直觉知识;intuitions

15. sequence [ˈsɪːkwəns] n. 顺序,次序;连续事件(或动作、事物);(电影中表现同一主题或场面的)一组镜头;(生物学中分子或基因的排列)顺序;模进;(纸牌的)同花顺;序列;(天主教弥撒中圣歌与福音之间的)继续经 v. 按顺序排列;测定(整套基因或分子成分的)序列;用音序器播放(或录制)音乐;sequences;sequences;sequencing;sequenced;sequenced

16. domain [dəʊˈmeɪn] n. 领域,范围;领土,势力范围;(因特网上的)域;(函数的)定义域;地产 【名】 (domain)(英、法)多曼(人名);domains

17. remains [rɪˈmeɪns] n. 剩余物,残留物;遗体,遗骸;古迹,遗迹 v. 仍然是,保持不变;逗留,留下;剩余,余留(remain 的第三人称单数形式)

18. breadth [bredθ] n. 宽度,幅度;广度,广泛性;breadths

19. nerve [nɜːv] n. 神经;勇气,胆量;神经紧张,情绪不安;鲁莽,冒失;(叶)脉 v. 使鼓起勇气,使振作精神;nerves;nerves;nerving;nerved;nerved

20. compute [kəmˈpjuːt] v. 估算,计算;不合情理,讲不通;用计算机计算 n. 计算,估计;computes;computing;computed;computed

21. resource [rɪˈsɔːs] n. 自然资源;资源(指钱、物、人等);有助于实现目标的东西,资料;(对付困境所需的)个人素质(resources);(逆境中的)出路,应付办法;谋略,智谋 v. 向……提供资金(或设备);resources;resources;resourcing;resourced;resourced

22. monthly [ˈmʌnθlɪ] adj. 每月的,每月一次的;按月支付的,按月计算的;(票、通行证等)有效期为一个月的 n. 月刊 adv. 每个月,每月一次;monthlies

23. substantial [səbˈstænʃəl] adj. 大量的,价值巨大的;牢固的,结实的;基本的,实质性的;(饭菜)丰盛的;重要的,真实的;有地位的,富有的 n. 重要材料

24. numb [nʌm] adj. 麻木的,失去感觉的;迟钝的,呆滞的 v. 使麻木,使失去知觉;使迟钝,使呆滞;使(某种感觉)减轻,使减弱;number;numbs;numbing;numbed;numbed

25. whereas [(h)weərˈæz] conj. (表示对比)但是,然而;鉴于(用于文件的开头) n. 序言,开场白;条件语句;whereases

26. deceive [dɪˈsɪːv] v. 欺骗,蒙骗;使误信,误导;对(丈夫、妻子或伴侣)不忠;deceives;deceiving;deceived;deceived

27. concerning [kənˈsɜːnɪŋ] prep. 关于,涉及 adj. 让人不安的,令人担忧的 v. 有关,关于;影响;使担心(concern 的现在分词形式)

28. weapon [ˈwepən] n. 武器,兵器,凶器;(用于应付困境的)工具,手段 v. 武装,装备;weapons;weapons;weaponing;weaponed;weaponed

29. threshold [ˈθreʃhəʊld] n. 门槛,门口;阈,界,起始点;开端,起点,入门;机场跑道入口,跑道头

30. medium [ˈmɪːdɪəm] n. 媒介,媒体;方法,手段;(艺术创作)材料,素材;灵媒,巫师;培养基;环境;中等,中号;存储(或打印)介质;(颜料)溶剂(如油或水);(品质、状态)中等,中庸 adj. 中等的,中间的,适中的;五分熟的,半熟的;(程度、强度或数量)平均的;(颜色)不深不浅的,适中的;(投球,投球手)中速的;media;mediums

31. accord [əˈkɔːd] n. 协议,条约;符合,一致 v. 使受到,给予(某种待遇);(与……)一致,符合;accords;accords;according;accorded;accorded

32. economic [ˌɪːkəˈnɒmɪk] adj. 经济的,经济学的;有利可图的;节约的

33. incentive [ɪnˈsentɪv] n. 激励,刺激;incentives

34. accelerate [ækˈseləreɪt] v. (使)加快,促进;(车辆或驾驶者)加速;accelerates;accelerating;accelerated;accelerated

35. anticipate [ænˈtɪsɪpeɪt] v. 预期,预料;预见(并做准备);期望,盼望;先于……做,早于……行动;提前使用;在期限内履行(义务),偿还(债务);anticipates;anticipating;anticipated;anticipated

36. regulate [ˈregjəleɪt] v. (用规则条例)控制,管理;调节,调整;监控,监管;校准,对准(时钟或其他仪器);regulates;regulating;regulated;regulated



Lack of internal deliberation abilities — thinking, in other words — has long been considered one of the main weaknesses of artificial intelligence. The scale of a recent advance in this by ChatGPT creator OpenAI is a point of debate within the scientific community. But it leads many of my expert colleagues and I to believe that there is a chance that we are on the brink of bridging the gap to human-level reasoning.

缺乏内部思维能力(换句话说就是不能思考)长期被认为是人工智能(AI)的主要弱点之一。ChatGPT的创建者OpenAI近年在这方面取得的进展规模是科学界内部的一个辩论焦点。但它让我和我的许多专家同僚相信,我们有可能即将缩小与人类水平推理之间的差距。

Researchers have long argued that traditional neural networks — the leading approach to AI — align more with “system 1” cognition. This corresponds to direct or intuitive answers to questions (such as when automatically recognising a face). Human intelligence, on the other hand, also relies on “system 2” cognition. This involves internal deliberation and enables powerful forms of reasoning (like when solving a maths problem or planning something in detail). It allows us to combine pieces of knowledge in coherent but novel ways.

研究人员长期主张,传统的神经网络(AI的领先方法)更符合“系统1”认知。这对应于针对问题给出直接或直观答案(例如在自动识别人脸时)。另一方面,人类智能也依赖于“系统2”认知。它涉及到内部思维,并启用强大的推理形式(例如在解决数学难题或详细规划某事时)。它使我们能够以连贯而新颖的方式组合知识点。

OpenAI’s advance, which has not yet been fully released to the public, is based on a form of AI with internal deliberation made with their o1 large language model (LLM).

OpenAI的进展(尚未完全向公众发布)是基于使用其o1大型语言模型(LLM)进行内部思维的AI形式。

第一,学习deliberation的用法。这个词做名词,表示careful consideration or discussion of something,细想;考虑;商议,例:After much deliberation, first prize was awarded to Derek Murray. 经仔细商议,一等奖颁给了德里克·默里。

第二,学习the brink (of sth)的用法。这个词组表示a situation when you are almost in a new situation, usually a bad one,(某事物的)边缘〔一般指不好的情况〕,常用搭配为on the brink of death / disaster / war etc,例:In October 1962 the world seemed on the brink of nuclear war. 1962年10月,世界似乎处于核战争的边缘。

第三,学习coherent的用法。这个词做形容词,表示if a piece of writing, set of ideas etc is coherent, it is easy to understand because it is clear and reasonable,〔文章、观点等〕连贯的,有条理的,一致的,例:The three years of the course are planned as a coherent whole. 这三年的课程是作为连贯的整体来安排的。

Better reasoning would address two major weaknesses of current AI: poor coherence of answers and the ability to plan and achieve long-term goals. The former is important in scientific uses and the latter is essential to create autonomous agents. Both could enable important applications.

更好的推理将解决当前AI的两大弱点:答案连贯性以及规划和实现长期目标的能力较差。前者对于科学用途很重要,而后者对于创建自主智能体(autonomous agent)不可或缺。两者都可以被用来实现重要的应用。

The principles behind reasoning have been at the heart of AI research in the 20th century. An early example of success was DeepMind’s AlphaGo, the first computer system to beat human champions at the ancient Asian game of Go in 2015, and more recently AlphaProof, which engages with mathematical subjects. Here, neural networks learn to predict the usefulness of an action. Such “intuitions” are then used to plan by efficiently searching possible sequences of actions.

推理背后的原理一直是20世纪AI研究的核心。早期的成功例子是DeepMind的AlphaGo(它在2015年成为第一个在古老的围棋博弈中击败人类冠军的计算机系统),以及最近的AlphaProof(用来解决数学课题)。在这里,神经网络学会预测一个行动的有用性,然后利用这种“直觉”高效率地搜索可能的行动次序,从而进行规划。

However, AlphaGo and AlphaProof involve very specialised knowledge (of the game of Go and specific mathematical domains respectively). What remains unclear is how to combine the breadth of knowledge of modern LLMs with powerful reasoning and planning abilities.

然而,AlphaGo和AlphaProof涉及高度专业的知识(分别涉及围棋和特定的数学领域)。尚不清楚的是,如何将现代大型语言模型的广博知识与强大的推理和规划能力结合起来。

第一,学习neural的用法。这个词做形容词,表示relating to a nerve or the NERVOUS SYSTEM,神经的;神经系统的,例:signs of neural activity 神经活动的迹象

第二,学习sequence的用法。这个词做名词,表示the order that something happens or exists in, or the order it is supposed to happen or exist in,〔事情发生的〕顺序,次序,例:The questions should be asked in a logical sequence. 应该按逻辑顺序来提问。


There have been some advancements. Already, LLMs come up with better answers to complex questions when asked to produce a chain of thought leading to their answer.

已经取得了一些进展。在被要求给出一条通往答案的思路链时,大型语言模型已经能够针对复杂问题给出更好的答案。

OpenAI’s new “o” series pushes this idea further, and requires far more computing resources, and therefore energy, to do so. With a very long chain of thought it is trained to “think” better.

OpenAI的“o”系列新模型进一步推进了这一构想,为此需要多得多的计算资源,消耗更多的能量。通过非常长的思路链,它可以被训练得更善于“思考”。

We thus see a new form of computational scaling appear. Not just more training data and larger models but more time spent “thinking” about answers. This leads to substantially improved capabilities in reasoning-heavy tasks such as mathematics, computer science and science more broadly.

因此,我们看到了一种新的计算扩展形式。不仅有更多的训练数据和更大的模型,而且花更多的时间“思考”答案。这将大大提高在数学、计算机科学和广义科学领域完成需要大量推理的任务的能力。

第一,学习computation的用法。这个词做名词,表示the process of calculating or the result of calculating,计算;计算的结果,例:the computation of the monthly statistics 每月统计数字的计算

第二,学习substantial的用法。这个词做形容词,表示large in amount or number大量的,多的,例:We have the support of a substantial number of parents. 我们有相当多家长的支持。


For example, whereas OpenAI’s previous model GPT-4o only scored about 13 per cent in the 2024 United States Mathematical Olympiad (on the AIME test), o1 reached an 83 per cent mark, placing it among the top 500 students in the country.

例如,OpenAI之前的模型GPT-4o在2024年美国数学奥林匹克竞赛(AIME竞赛)中的得分仅为大约13%,而o1模型的得分达到83%,跻身于美国最优秀的500名学生之列。

If successful, there are major risks to consider. We don’t yet know how to align and control AI reliably. For example, the evaluation of o1 showed an increased ability to deceive humans — a natural consequence of improving goal-reaching skills. It is also concerning that the ability of o1 in helping to create biological weapons has crossed OpenAI’s own risk threshold from low to medium. This is the highest acceptable level according to the company (which may have an interest in keeping concerns low).

如果成功,就需要考虑重大风险。我们还不知道如何可靠地对AI进行价值对齐和控制。例如,对o1的评估显示,它欺骗人类的能力有所提高——这是达到目标的技能得到提高的天然后果。同样令人担忧的是,按照OpenAI自己的风险尺度,o1帮助制造生物武器的能力已经从低风险上升到中等风险。这是该公司自称可接受的最高水平(压低担忧水平可能符合该公司的利益)。

Unlocking reasoning and agency are believed to be the main milestones on the road to human-level AI, also known as artificial general intelligence. There are therefore powerful economic incentives for large companies racing towards this goal to cut corners on safety.

解锁推理和能动性,据信是通往人类水平AI——也被称为通用人工智能(AGI)——道路上的主要里程碑。因此,大公司在竞相达到这一目标的过程中,有强大的经济动机在安全上打折扣。

o1 is likely to be only a first step. Although it does well at many reasoning and mathematical tasks, it looks like long-term planning has still not been achieved. o1 struggles on more complex planning tasks, suggesting that there is still work to be done to achieve the kind of autonomous agency sought by AI companies.

o1很可能只是第一步。尽管它在许多推理和数学任务上表现出色,但它看起来仍做不到长期规划。比较复杂的规划任务会让o1陷入挣扎,似乎表明要实现AI公司所追求的那种自主能动性,仍有工作要做。

But with improved programming and scientific abilities, it is to be expected that these new models could accelerate research on AI itself. This could get it to human-level intelligence faster than anticipated. Advances in reasoning abilities make it all the more urgent to regulate AI models in order to protect the public.

但随着编程和科学能力的提高,可以预期这些新模型可以加速AI本身的研究,使AI比预期更快地达到人类水平的智能。推理能力的进步使得监管AI模型以保护公众变得格外紧迫。






外刊文章来源,2024年11月10日 金融时报 AI can learn to think before it speaks


每天一测,模拟答题:

2025考研er,关注公众号进入阅读 ↓↓







掌握语言,是为了换一个视角看世界
也许长,但必定值得耐心学习
愿你看待这个星球的眼光能够批判且不同
您怎么看?



外刊看世界
新学期,学英语,带你读外刊,助力考研,我们一起加油!
 最新文章