OpenAI o1 的智商已经达到120,高于普通人平均水平

科技   2024-12-18 11:32   北京  
近日,OpenAI 的新推理模型 o1 在挪威门萨智商测试中得分为 121,这标志着 AI 模型首次超过人类平均智商。该模型从根本上改变了 AI 系统的思维和性能,并为机器学习和问题解决开辟了新的视野,标志着一个里程碑。

How OpenAI’s O1 Series Stands Out Redefining AI Reasoning

From Medium.
Compared to GPT-4o, the o1-preview model has improved its ability to solve mathematical and programming problems by over 5 times, while the yet-to-be-released o1 model has achieved an improvement of over 8 times! The success rate in solving PhD-level scientific problems has surpassed that of human experts. Its performance in physics and chemistry competitions exceeds that of human PhDs. In the International Mathematical Olympiad (IMO) qualification exam, GPT-4o only correctly solved 13% of the problems, whereas the reasoning model scored 83%. In programming competitions, the model’s abilities on Codeforces have surpassed 89% of human participants. It appears that o1 exceeds the highest human capabilities in various fields, including science, making it easy to understand Altman’s previous confidence in achieving AGI.
It is foreseeable that against the backdrop of decreasing marginal costs of pre-training, reinforcement learning-based inference enhancement will gain more attention and play a significant role. More computational resources will be devoted to the inference phase, and the global demand for AI chips and computing power will continue to increase.
The human learning process involves first acquiring vast amounts of knowledge, forming intelligence through extensive neural activation and connections, while specific knowledge is often forgotten, akin to Zhang Wuji learning Tai Chi. In solving different problems, besides relying on language comprehension and logical reasoning abilities, we also rely on accessing and referencing credible knowledge, the emergence of creative inspiration, and emotional interpersonal connections and empathy. AI will not merely be a large deep learning model but will become an increasingly “sparse” and flexible combination of capabilities, potentially even a new mechanism for human-machine collaboration. The ability to “solve problems” is certainly necessary, but mastering problem-solving is still a considerable distance from addressing real-world issues.

注:中文文本为机器翻译并非一一对应,仅供参考
Compared to GPT-4o, the o1-preview model has improved its ability to solve mathematical and programming problems by over 5 times, while the yet-to-be-released o1 model has achieved an improvement of over 8 times! The success rate in solving PhD-level scientific problems has surpassed that of human experts. Its performance in physics and chemistry competitions exceeds that of human PhDs. In the International Mathematical Olympiad (IMO) qualification exam, GPT-4o only correctly solved 13% of the problems, whereas the reasoning model scored 83%. In programming competitions, the model’s abilities on Codeforces have surpassed 89% of human participants. It appears that o1 exceeds the highest human capabilities in various fields, including science, making it easy to understand Altman’s previous confidence in achieving AGI.

GPT-4o 相比,o1-preview 模型解决数学和编程问题的能力提高了 5 倍以上,而尚未发布的 o1 模型则实现了 8 倍以上的改进!解决博士水平科学问题的成功率已经超过了人类专家。它在物理和化学竞赛中的表现超过了人类的博士。在国际数学奥林匹克 IMO 资格考试中,GPT-4o 仅正确解决了 13% 的问题,而推理模型的得分为 83%。在编程竞赛中,该模型在 Codeforces 上的能力已超过 89% 的人类参与者。o1 似乎超过了人类在各个领域(包括科学)的最高能力,这很容易理解 Altman 之前对实现 AGI 的信心。

surpass

/sərˈpæs/表示“超过”,英文解释为Surpass means to go beyond or exceed something in quality, achievement, or ability.

AGI

artificial general intelligence通用人工智能

It is foreseeable that against the backdrop of decreasing marginal costs of pre-training, reinforcement learning-based inference enhancement will gain more attention and play a significant role. More computational resources will be devoted to the inference phase, and the global demand for AI chips and computing power will continue to increase.

可以预见,在预训练边际成本降低的背景下,基于强化学习的推理增强将获得更多关注并发挥重要作用。更多的计算资源将投入到推理阶段,全球对 AI 芯片和算力的需求将持续增加。
foreseeable

/fɔːrˈsiːəbl/.表示可预知的,英文解释为Foreseeable means something that can be predicted or anticipated based on current knowledge or circumstances.

backdrop

/ˈbækdrɒp/,表示(事件的)背景,英文解释为The backdrop to an object or a scene is what you see behind it. 

The human learning process involves first acquiring vast amounts of knowledge, forming intelligence through extensive neural activation and connections, while specific knowledge is often forgotten, akin to Zhang Wuji learning Tai Chi. In solving different problems, besides relying on language comprehension and logical reasoning abilities, we also rely on accessing and referencing credible knowledge, the emergence of creative inspiration, and emotional interpersonal connections and empathy. AI will not merely be a large deep learning model but will become an increasingly “sparse” and flexible combination of capabilities, potentially even a new mechanism for human-machine collaboration. The ability to “solve problems” is certainly necessary, but mastering problem-solving is still a considerable distance from addressing real-world issues.

人类的学习过程首先包括获取大量的知识,通过广泛的神经激活和连接形成智能,而特定的知识往往被遗忘,类似于张无忌学习太极拳。在解决不同的问题时,除了依靠语言理解和逻辑推理能力外,我们还依靠获取和引用可信的知识,创造性灵感的出现, 以及情感人际关系和同理心。AI 将不仅仅是一个大型深度学习模型,而是将成为一种越来越“稀疏”和灵活的功能组合,甚至可能成为一种新的人机协作机制。“解决问题”的能力当然是必要的,但掌握解决问题的能力距离解决现实世界的问题还有相当大的距离。

neural

/ˈnjʊərəl/,表示神经的,英文解释为relating to a nerve or to the nervous system. 

comprehension

/ˌkɒmprɪˈhenʃ(ə)n/表示理解力,英文解释为Comprehension is the ability to understand something. 

interpersonal

/ˌɪntəˈpɜːsən(ə)l/表示人际关系的,英文解释为Interpersonal means relating to relationships between people. 

关注我们获取更多精彩内容


往期推荐

● 智慧金融 算力未来 | 6大亮点曝光,EDC变革一触即发,不容错过!

● 最佳演讲人气王 | 抖音井汤博 数据中心技术矩阵和产品套餐化研发策略

● 最佳演讲人气王 | 阿里云任华华 一册在手 液冷不愁——《数据中心液冷系统技术规程》内容解析

● 最佳演讲人气王 | 康普吴健:关键网络决定智算效率

● 最佳演讲人气王 | 世纪互联刘学潮:数据中心国产柴发的机遇和挑战

CDCC
数据中心标准、技术沟通交流平台
 最新文章