OpenAI o1 的智商已经达到120，高于普通人平均水平

科技 2024-12-18 11:32 北京

近日，OpenAI 的新推理模型 o1 在挪威门萨智商测试中得分为 121，这标志着 AI 模型首次超过人类平均智商。该模型从根本上改变了 AI 系统的思维和性能，并为机器学习和问题解决开辟了新的视野，标志着一个里程碑。

How OpenAI’s O1 Series Stands Out Redefining AI Reasoning

From Medium.

Compared to GPT-4o, the o1-preview model has improved its ability to solve mathematical and programming problems by over 5 times, while the yet-to-be-released o1 model has achieved an improvement of over 8 times! The success rate in solving PhD-level scientific problems has surpassed that of human experts. Its performance in physics and chemistry competitions exceeds that of human PhDs. In the International Mathematical Olympiad (IMO) qualification exam, GPT-4o only correctly solved 13% of the problems, whereas the reasoning model scored 83%. In programming competitions, the model’s abilities on Codeforces have surpassed 89% of human participants. It appears that o1 exceeds the highest human capabilities in various fields, including science, making it easy to understand Altman’s previous confidence in achieving AGI.

It is foreseeable that against the backdrop of decreasing marginal costs of pre-training, reinforcement learning-based inference enhancement will gain more attention and play a significant role. More computational resources will be devoted to the inference phase, and the global demand for AI chips and computing power will continue to increase.

The human learning process involves first acquiring vast amounts of knowledge, forming intelligence through extensive neural activation and connections, while specific knowledge is often forgotten, akin to Zhang Wuji learning Tai Chi. In solving different problems, besides relying on language comprehension and logical reasoning abilities, we also rely on accessing and referencing credible knowledge, the emergence of creative inspiration, and emotional interpersonal connections and empathy. AI will not merely be a large deep learning model but will become an increasingly “sparse” and flexible combination of capabilities, potentially even a new mechanism for human-machine collaboration. The ability to “solve problems” is certainly necessary, but mastering problem-solving is still a considerable distance from addressing real-world issues.

注：中文文本为机器翻译并非一一对应，仅供参考

Compared to GPT-4o, the o1-preview model has improved its ability to solve mathematical and programming problems by over 5 times, while the yet-to-be-released o1 model has achieved an improvement of over 8 times! The success rate in solving PhD-level scientific problems has surpassed that of human experts. Its performance in physics and chemistry competitions exceeds that of human PhDs. In the International Mathematical Olympiad (IMO) qualification exam, GPT-4o only correctly solved 13% of the problems, whereas the reasoning model scored 83%. In programming competitions, the model’s abilities on Codeforces have surpassed 89% of human participants. It appears that o1 exceeds the highest human capabilities in various fields, including science, making it easy to understand Altman’s previous confidence in achieving AGI.

与 GPT-4o 相比，o1-preview 模型解决数学和编程问题的能力提高了 5 倍以上，而尚未发布的 o1 模型则实现了 8 倍以上的改进！解决博士水平科学问题的成功率已经超过了人类专家。它在物理和化学竞赛中的表现超过了人类的博士。在国际数学奥林匹克（IMO）资格考试中，GPT-4o 仅正确解决了 13% 的问题，而推理模型的得分为 83%。在编程竞赛中，该模型在 Codeforces 上的能力已超过 89% 的人类参与者。o1 似乎超过了人类在各个领域（包括科学）的最高能力，这很容易理解 Altman 之前对实现 AGI 的信心。

surpass‍‍

/sərˈpæs/表示“超过”，英文解释为Surpass means to go beyond or exceed something in quality, achievement, or ability.

AGI

artificial general intelligence通用人工智能

It is foreseeable that against the backdrop of decreasing marginal costs of pre-training, reinforcement learning-based inference enhancement will gain more attention and play a significant role. More computational resources will be devoted to the inference phase, and the global demand for AI chips and computing power will continue to increase.

可以预见，在预训练边际成本降低的背景下，基于强化学习的推理增强将获得更多关注并发挥重要作用。更多的计算资源将投入到推理阶段，全球对 AI 芯片和算力的需求将持续增加。

foreseeable

/fɔːrˈsiːəbl/.表示可预知的，英文解释为Foreseeable means something that can be predicted or anticipated based on current knowledge or circumstances.

backdrop

/ˈbækdrɒp/，表示（事件的）背景，英文解释为The backdrop to an object or a scene is what you see behind it.

The human learning process involves first acquiring vast amounts of knowledge, forming intelligence through extensive neural activation and connections, while specific knowledge is often forgotten, akin to Zhang Wuji learning Tai Chi. In solving different problems, besides relying on language comprehension and logical reasoning abilities, we also rely on accessing and referencing credible knowledge, the emergence of creative inspiration, and emotional interpersonal connections and empathy. AI will not merely be a large deep learning model but will become an increasingly “sparse” and flexible combination of capabilities, potentially even a new mechanism for human-machine collaboration. The ability to “solve problems” is certainly necessary, but mastering problem-solving is still a considerable distance from addressing real-world issues.

人类的学习过程首先包括获取大量的知识，通过广泛的神经激活和连接形成智能，而特定的知识往往被遗忘，类似于张无忌学习太极拳。在解决不同的问题时，除了依靠语言理解和逻辑推理能力外，我们还依靠获取和引用可信的知识，创造性灵感的出现，以及情感人际关系和同理心。AI 将不仅仅是一个大型深度学习模型，而是将成为一种越来越“稀疏”和灵活的功能组合，甚至可能成为一种新的人机协作机制。“解决问题”的能力当然是必要的，但掌握解决问题的能力距离解决现实世界的问题还有相当大的距离。

neural

/ˈnjʊərəl/，表示神经的，英文解释为relating to a nerve or to the nervous system.

comprehension

/ˌkɒmprɪˈhenʃ(ə)n/表示理解力，英文解释为Comprehension is the ability to understand something.

interpersonal

/ˌɪntəˈpɜːsən(ə)l/表示人际关系的，英文解释为Interpersonal means relating to relationships between people.

关注我们获取更多精彩内容

往期推荐

● 智慧金融算力未来 | 6大亮点曝光，EDC变革一触即发，不容错过！

● 最佳演讲人气王 | 抖音井汤博数据中心技术矩阵和产品套餐化研发策略

● 最佳演讲人气王 | 阿里云任华华一册在手液冷不愁——《数据中心液冷系统技术规程》内容解析

● 最佳演讲人气王 | 康普吴健：关键网络决定智算效率

● 最佳演讲人气王 | 世纪互联刘学潮：数据中心国产柴发的机遇和挑战

CDCC

数据中心标准、技术沟通交流平台

最新文章

“2024中国金融行业数据中心发展论坛”参会指南——全面解析会议亮点与参会攻略

某金融数据中心扩容改造供配电在线割接实战

液冷板防凝露的关键措施与技术

会议全日程 | 2024中国金融行业数据中心发展论坛开幕在即——共筑数据未来，引领金融科技新篇章

451 Research | 未来核能对人工智能发展的影响

OpenAI o1 的智商已经达到120，高于普通人平均水平

面向未来的能源互联网 | 伊顿&世纪互联能源路由器发布会成功举办

浅谈江苏某数据中心空调系统设计

分布式光纤传感（Distributed Fiber Optic Sensing）发展历史

最佳演讲人气王 | 世纪互联刘学潮：数据中心国产柴发的机遇和挑战

2024年数据中心用电需求将增至20.9G，马来西亚政府延长发电厂营运期限应对

微软发布数据中心闭环液冷设计，“零水蒸发”冷却系统预计2026年开始运行

最佳演讲人气王 | 康普吴健：关键网络决定智算效率

马来西亚发布新指南以推动数据中心发展

CDCC专家探营 | 大美时代视听大数据产业园，一个有格调的高品质数据中心

GPU服务器支持的“卡数”由哪些因素决定？（上篇）

GPU服务器支持的“卡数”由哪些因素决定？（下篇）

新型TIM：散热效率提升高达70%？AI数据中心是不是可以不慌了！

更名一年后，这家企业又有新举措

NTT公司试验两相直芯液冷并推出现场试验数据中心

数字经济新动能！中国移动马山数据中心即将竣工

阿里巴巴官方回应“河源云计算数据中心失火”，对云服务未造成任何影响

最佳演讲人气王 | 抖音井汤博数据中心技术矩阵和产品套餐化研发策略

原创中标｜山西太原万家寨云谷大数据中心

科技巨头公司Meta为数据中心寻求核电支持

智慧金融算力未来 | 6大亮点曝光，EDC变革一触即发，不容错过！

世界上最智能的语言模型！OpenAI“满血版”o1大模型上线

NVIDIA GH200 内部架构探究-2

继Google的数据落地越南之后，Nvidia又在越南建智能和数据中心！

联想郝京阳：拥抱液冷：联想推动AI产业绿色高质量发展

让算力更加容易更加便宜

SK海力士将采用台积电3nm制程生产第六代高频宽內存HBM4

NVIDIA GH200 内部架构探究-1

最佳演讲人气王 | 阿里云任华华一册在手液冷不愁——《数据中心液冷系统技术规程》内容解析

NVIDIA 危？亚马逊计划推出AI芯片Trainium2

Amazon投资建造Xe-100先进反应堆

360智算中心：万卡GPU集群落地实践

绿色智算多原理兼容开放架构创新

美国商务部加强出口管制，限制中国先进半导体能力

腾讯落地全国首个风光储一体化数据中心微电网项目

CDCC 2024数据中心标准大会：海悟多场景液冷探索，风液同行，为高算力时代保驾护航

西安交通大学魏进家团队最新论文 | 微针翼歧管分布式射流微通道的数值与实验研究

约68.8亿元，1.8万机柜！兴业银行贵安新区数据中心项目开工建设

顶尖智慧碰撞，共同擘画AIDC发展蓝图

单志广：关于“算力网”和“算力网络”的几点粗浅思考

“去中国化”又一次加码，思科禁止使用原产中国的部件！

智慧金融算力未来 | 2024中国金融行业数据中心发展论坛报名通道正式开启

CDCC 数据中心标准大会 | STULZ液冷产品备受关注

惠普、戴尔抢囤中国产零部件！

2024 CDCC数据中心标准大会| 张健：智算未来，IDC变革与转型之路

分类

时事

民生

政务

教育

文化

科技

财富

体娱

健康

情感

旅行

百科

职场

楼市

企业

乐活

学术

汽车

时尚

创业

美食

幽默

美体

文摘

原创标签

时事社会财经军事教育体育科技汽车科学房产搞笑综艺明星音乐动漫游戏时尚健康旅游美食生活摄影宠物职场育儿情感小说曲艺文化历史三农文学娱乐电影视频图片新闻宗教电视剧纪录片广告创意壁纸头像心灵鸡汤星座命理教育培训艺术文化金融财经健康医疗美妆时尚餐饮美食母婴育儿社会新闻工业农业时事政治星座占卜幽默笑话独立短篇连载作品文化历史科技互联网

发布位置

广东北京山东江苏河南浙江山西福建河北上海四川陕西湖南安徽湖北内蒙古江西云南广西甘肃辽宁黑龙江贵州新疆重庆吉林天津海南青海宁夏西藏香港澳门台湾美国加拿大澳大利亚日本新加坡英国西班牙新西兰韩国泰国法国德国意大利缅甸菲律宾马来西亚越南荷兰柬埔寨俄罗斯巴西智利卢森堡芬兰瑞典比利时瑞士土耳其斐济挪威朝鲜尼日利亚阿根廷匈牙利爱尔兰印度老挝葡萄牙乌克兰印度尼西亚哈萨克斯坦塔吉克斯坦希腊南非蒙古奥地利肯尼亚加纳丹麦津巴布韦埃及坦桑尼亚捷克阿联酋安哥拉