人工智能比人类专家产生更多新颖、令人兴奋的研究想法

文摘   2024-09-12 21:31   北京  

人工智能比人类专家产生更多新颖、令人兴奋的研究想法

如图所示,人工智能使用生成工具,是否可能比人类更善于提出新想法?
查看 2 张图片

第一个具有统计意义的结果是:大型语言模型 (LLM) AI 不仅可以产生新的专家级科学研究想法,而且它们的想法比我们最好的想法更具原创性和令人兴奋——正如人类专家所判断的那样

大型语言模型 (LLM) 的最新突破让研究人员对其彻底改变科学发现的潜力感到兴奋,ChatGPT 和 Anthropic 的 Claude 等模型表现出自主生成和验证新研究想法的能力。

更多故事
摄影
Snappy 设备让智能手机更像真正的相机
技术
“无质量”电池有望使电动汽车续航里程增加 70%

当然,这是大多数人认为人工智能永远无法取代人类的众多事情之一:产生新知识和做出新的科学发现的能力,而不是从训练数据中拼凑现有知识的能力。

但与艺术表达、音乐创作、编码、理解潜台词和肢体语言以及其他许多新兴能力一样,当今的多模式人工智能似乎确实能够产生新颖的研究——平均而言比人类同行更具新颖性。

直到最近,该领域才开始展开研究,当时 100 多名自然语言处理 (NLP) 研究专家(来自 36 个不同的知名机构的博士和博士后)与 LLM 生成的“创意代理”进行了面对面的较量,以看看谁的研究想法更原创、更令人兴奋和更可行——由人类专家评判。

NLP 领域是人工智能的一个分支,涉及人类与人工智能之间的交流,使用双方都能“理解”的语言,包括基本语法、细微差别,以及最近的语调和情感变化

49 位人类专家针对 7 个 NLP 主题撰写了想法,而研究人员训练的 LLM 模型针对相同的 7 个主题生成了想法。该研究为每个想法支付 300 美元,并为排名前五的人类想法提供 1,000 美元的奖金,以激励人类提出合理、易于遵循和执行的想法。

完成后,将使用法学硕士学位来标准化每篇提交作品的写作风格,同时保留原始内容,以提供公平的竞争环境,也就是说,尽可能保持研究的盲目性。

随后,所有提交的内容均由 79 名招募的人类专家进行审查,并对所有研究想法进行盲审。小组提交了 298 份评论,对每个想法进行两到四次独立评论。

果然,在新颖性和刺激性方面,人工智能的表现明显优于人类研究人员。在可行性方面,人工智能的排名也略低于人类,在有效性方面则略高于人类——但这些影响在统计上都不显著。


系统繁忙

重试

总体看人类论文与 LLM 生成的想法的对比情况

这项研究还发现了一些缺陷,例如法学硕士在产生想法方面缺乏多样性,以及在自我评估方面的局限性。即使有明确的指示不要重复,法学硕士也会很快开始这样做。法学硕士也无法以很高的一致性审查和评分想法,并且与人类判断的一致性得分较低。

该研究还承认,即使由专家小组来判断,人类对一个想法的“原创性”的判断也相当主观。

为了更好地证明法学硕士在自主科学发现潜力方面可能更好或可能不更好的理论,研究人员将招募更多专家参与者。他们提出了一项更全面的后续研究,其中人工智能和人类产生的想法都被充分发展成项目,从而可以更深入地探索它们在现实世界场景中的影响。

但这些初步发现无疑令人警醒。人类发现自己正面临着一个陌生的新对手。语言模型人工智能正在成为非常强大的工具——但它们仍然非常不可靠,而且容易出现人工智能公司所说的“幻觉”,以及其他人可能称之为“胡说八道”的情况。

他们可以处理堆积如山的文书工作,但在严谨的科学方法中,绝对没有“幻觉”的余地。科学不能建立在胡扯的基础上。据估计,目前至少有 10% 的研究论文是由人工智能共同撰写的,这已经够可耻的了。

另一方面,我们不能低估人工智能在某些领域加速进步的潜力——Deepmind的 GNoME 系统就是明证,它在短短数月内完成了约 800 年的材料发现,并提出了约 380,000 种新型无机晶体的配方,这些晶体可能在各个领域具有革命性的潜力。

这是人类有史以来发展最快的技术;可以合理地预期,在未来几年内,它的许多缺陷将得到修补和掩盖。许多人工智能研究人员认为,我们正在接近通用超级智能——届时,通用人工智能将在几乎所有领域超越专家知识。

看着我们最伟大的发明迅速掌握了这么多我们认为让我们与众不同的东西——包括产生新想法的能力,这确实是一种奇怪的感觉。人类的聪明才智似乎把人类逼入了绝境,成为不断缩小差距的古老神灵。

尽管如此,在不久的将来,只要我们能够保持目标的一致,我们就可以通过共生关系取得最佳进展,让有机体和人工智能共同努力。

但如果这是一场比赛的话,那么这一轮的结果是 AI:1,人类:0。

来源:司承磊 via X


是不是挺乱,欢迎取关

  • Facebook
  • 叽叽喳喳
  • Flipboard
  • LinkedIn
暂无评论
0 条评论
暂无评论。成为第一个评论者吧!


AIs generate more novel and exciting research ideas than human experts

Is it possible that AI , pictured here using generative tools, might be better at coming up with new ideas than humans? 
VIEW 2 IMAGES

The first statistically significant results are in: not only can Large Language Model (LLM) AIs generate new expert-level scientific research ideas, but their ideas are more original and exciting than the best of ours – as judged by human experts.

Recent breakthroughs in large language models (LLMs) have excited researchers about the potential to revolutionize scientific discovery, with models like ChatGPT and Anthropic's Claude showing an ability to autonomously generate and validate new research ideas.

MORE STORIES
PHOTOGRAPHY
Snappy device makes smartphones more like real cameras
TECHNOLOGY
'Massless' battery promises a 70% increase in EV range

This, of course, was one of the many things most people assumed AIs could never take over from humans; the ability to generate new knowledge and make new scientific discoveries, as opposed to stitching together existing knowledge from their training data.

But as with artistic expression, music composition, coding, understanding subtext and body language, and any number of other emergent abilities, today's multimodal AIs do appear to be able to generate novel research – more novel on average than their human counterparts.

No previous research had been done in this field until recently, when over 100 natural language processing (NLP) research experts (PhDs and post-doctorates from 36 different, well-regarded institutions) went head-to-head with LLM-generated 'ideation agents' to see whose research ideas were more original, exciting and feasible – as judged by human experts.

The field of NLP is a branch of artificial intelligence that deals with communication between humans and AIs, in language that both sides can 'understand,' in terms of basic syntax, but also nuance – and more recently, in terms of verbal tone and emotional inflection. 

49 human experts wrote ideas on 7 NLP topics, while an LLM model trained by the researchers generated ideas on the same 7 topics. The study paid US$300 for each idea plus a bonus of $1,000 to the top five human ideas in an effort to incentivize the humans to produce legitimate, easy-to-follow and execute ideas.

Once complete, an LLM was used to standardize the writing styles of each submitted entry while preserving the original content in order to level the playing field, so to speak, keeping the study as blind as possible.

All the submissions were then reviewed by 79 recruited human experts and a blind judgment of all research ideas was made. The panel submitted 298 reviews, giving each idea between two to four independent reviews.

And sure enough, when it comes to novelty and excitement, the AIs tested significantly better than human researchers. They also ranked slightly lower than humans in feasibility, and slightly higher in effectiveness – but neither of these effects were found to be statistically significant. 


系统繁忙

重试

An overall look at how human papers scored against LLM generated ideas

The study also uncovered certain flaws, such as the LLM's lack of diversity in generating ideas as well as their limitations in self-evaluation. Even with explicit direction not to repeat itself, the LLM would quickly begin to do so. LLMs also weren't able to review and score ideas with much consistency and scored low in agreement with human judgments.

The study also acknowledges that the human side of judging the "originality" of an idea is rather subjective, even with a panel of experts.

To better prove the theory that LLMs may or may not be better at the potential for autonomous scientific discovery, the researchers will recruit more expert participants. They propose a more comprehensive follow-up study, where the ideas generated by both AI and humans are fully developed into projects, allowing for a more in-depth exploration of their impact in real-world scenarios.

But these initial findings are certainly sobering. Humanity finds itself looking a strange new adversary in the eye. Language model AIs are becoming incredibly capable tools – but they're still notoriously unreliable and prone to what AI companies call "hallucinations," and what anyone else might call "BS."

They can move mountains of paperwork – but there's certainly no room for "hallucinations" in the rigor of the scientific method. Science can't build on a foundation of BS. It's already scandalous enough that by some estimates, at least 10% of research papers are currently being co-written – at the very least – by AIs. 

On the other hand, we can't understate AI's potential to radically accelerate progress in certain areas – as evidenced by Deepmind's GNoME system, which knocked off about 800 years' worth of materials discovery in a matter of months, and spat out recipes for about 380,000 new inorganic crystals that could have revolutionary potential in all sorts of areas. 

This is the fastest-developing technology humanity has ever seen; it's reasonable to expect that many of its flaws will be patched up and painted over within the next few years. Many AI researchers believe we're approaching general superintelligence – the point at which generalist AIs will overtake expert knowledge in more or less all fields.

It's certainly a strange feeling watching our greatest invention rapidly master so many of the things we thought made us special – including the very ability to generate novel ideas. Human ingenuity seems to be painting humans into a corner, as old gods of ever-diminishing gaps.

Still, in the immediate future, we can make the best progress as a symbiosis, with the best of organic and artificial intelligence working together, as long as we can keep our goals in alignment. 

But if this is a competition, well, it's AI: 1, humans: 0 for this round.

Source: Chenglei Si via X

  • Facebook
  • Twitter
  • Flipboard
  • LinkedIn
NO COMMENTS
0 COMMENTS
There are no comments. Be the first!


科技世代千高原
透视深度科技化时代™ 探寻合意的人类未来
 最新文章