外刊精读|《经济学人》到底AI是如何思考的?

文摘   2024-09-17 21:00   山东  

着手翻译这篇文章的时候,我正在听一位知名博主讲到,我们生活的一切大概率是被模拟出来的虚拟世界。这个埃隆·马斯克非常坚信的想法,如今已经不是什么惊天秘闻。《黑客帝国》在二十多年前提出这一理念的时候,我还在高中准备去大学读计算机专业,在三部曲完成后的几年里我又读了模式识别、人工智能的研究生。看着国外大学翻译过来的《神经网络》和《模式识别》教材,我认为这种连鱼和人都不能轻易分辨的算法前景渺茫。如今,AI学习迭代的速度惊人,而最可怕的是我们依然像我在研究生阶段一样,无法理解它是如何思考的。

现在最前沿的研究人员在尝试理解和操控AI的思想和行为,正如过去几千年的圣人与统治者一直在理解和操控人类的思想和行为一样。毫无疑问,在人类试图理解AI如何思考的时候,AI正在理解人类、超越人类、操控人类。也许有一天,AI成为了那个圣人和统治者,人类也就正式完成了这个硅基文明的启动。

Inside the mind of AI

Researchers are finding ways to analyse the sometimes strange behaviour of large language models.

科研人员为分析大语言模型不时出现的奇怪行为,正在寻找各种方法。

To most people,the inner workings of a car engine or a computer are a mystery.It might as well be a black box: never mind what goes on inside, as long as it works. Besides, the people who design and build such complex systems know how they work in great detail, and can diagnose and fix them when they go wrong. But that is not the case for large language models (LLMs), such as GPT-4, Claude, and Gemini, which are at the forefront of the boom in artificial intelligence (AI).

对于大多数人来说,汽车发动机或者电脑的内部运行机制是个迷。可能更像个黑河:无论里面是如何运作,只要它还在正常运行就没啥好在意的。而设计建造这些复杂系统的人知道详尽的机理,在出现故障时也可以诊断和修复。但对于像GPT-4、Claude和Gemini这种AI发展风口浪尖的大语言模型LLMs来说,情况却并不如此。

LLMs are built using a technique called deep learning, in which a network of billions of neurons, simulated in software and modelled on the structure of the human brain, is exposed to trillions of examples of something to discover inherent patterns. Trained on text strings, LLMs can hold conversations, generate text in a variety of styles, write software code, translate between languages and more besides.

大语言模型LLMs建立在被称作深度学习的技术之上,这种技术在软件中以人类大脑的结构为原型,模拟出几十亿神经组成的神经元网络,面向千亿级的样本去学习事物内在的模式。基于对字符串的训练,LLMs可以掌握对话、生成多种样式的文字、写代码以及翻译等等能力。

Models are essentially grown, rather than designed, says Josh Batson, a researcher at Anthropic, an AI startup.Because LLMs are not explicitly programmed nobody is entirely sure why they have such extraordinary abilities. Nor do they know why LLMs sometimes misbehave, or give wrong or made-up answers, known as 'hallucinations'. LLMs really are black boxes. This is worrying, given that they and other deep-learning systems are starting to be used for all kinds of things, from offering customer support to preparing document summaries to writing software code.

AI创业公司Anthropic的研究员Josh Batson说,模型本质上是生长出来的,而不是设计出来的。因为LLMs不是用明确的方式进行编程,没有人能够完全确认它们为什么具备这些超凡的能力。也没有人知道为什么LLMs有时候会出现“幻觉”,操作失灵、给出错误或捏造的答案。LLMs是真正意义的黑盒。鉴于大语言模型和其他深度学习系统已经被广泛应用在从提供客服支持,到准备编程的文档总结等众多领域,这种不确定性令人担忧。

It would be helpful to be able to poke around inside an LLM to see what is going on, just as it is possible, given the right tools, to do with a car engine or a micro-processor. Being able to understand a model's inner workings in bottom-up, forensic detail is called 'mechanistic interpretability'. But it is a daunting task for networks with billions of internal neurons. That has not stopped people trying, including Dr. Batson and his colleagues. In a paper published in May, they explained how they have gained new insight into the workings of one of Anthropic's LLMs.

如果也能像检查汽车引擎或者微处理器一样,用合适的工具在LLMs内部摸索一番,将会对理解内部的运行机制非常有帮助。这种从里到外彻底理解模型内部运行机制的勘察细节,被称作“机制可解释性”。但面对数十亿量级的内部神经元网络,工作量大得令人生畏。不过这些难题没能阻止Batson和他的同事们去探索。在五月发表的一篇论文中,他们阐释了他们如何针对Anthropic一个大模型运行机制获得的全新洞察。

One might think individual neurons inside an LLM would correspond to specific words. Unfortunately, things are not that simple. Instead, individual words or concepts are associated with the activation of complex patterns of neurons, and individual neurons may be activated by many different words or concepts. This problem was pointed out in earlier work by researchers at Anthropic, published in 2022. They proposed—and subsequently tried—various workarounds, achieving good results on very small language models in 2023 with a so-called 'sparse autoencoder'. In their latest results, they have scaled up this approach to work with Claude3Sonnet, a full-sized LLM.

有人可能认为大模型的每个独立的神经元可能对应具体的单词。但不幸的是,事情没有那么简单。相对的反而是,独立的单词或者概念是与复杂的神经元模式的激活状态相关,每个独立的神经元也可能被不同的单词或者概念激活。Anthropic的研究员在2022年发表的早期论文中就指出了这种问题。他们假设并随后尝试了很多种方法,并在2023年以被称为“稀疏自动编码器”的方法应用在非常小的语言模型时取得了不错的成果。在最新的成果中,他们已经将这种方法延展到正常规模的LLM  Claude 3 Sonnet上。

A sparse autoencoder is, essentially, a second, smaller neural network that is trained on the activity of an LLM, looking for distinct patterns in activity when 'sparse' (i.e., very small) groups of its neurons fire together. Once many such patterns, known as features, have been identified, the researchers can determine which words trigger which features. The Anthropic team found individual features that corresponded to specific cities, people, animals, and chemical elements, as well as higher-level concepts such as transport infrastructure, famous female tennis players, or the notion of secrecy. They performed this exercise three times, identifying 1m, 4m, and, on the last go, 34m features within the Sonnet LLM.

所谓的稀疏自动编码器,本质上是一个用LLM行为模式训练的小型神经网络,用以发现稀疏(非常小)群组的神经元一起被激活时的独特模式。一旦许多这样被称作特征的模式被识别出来,研究人员就能判断出哪些词激活了这些特征。这个Anthropic小组发现了与具体城市、人物、动物和化学元素,甚至像交通基础设施、著名女网球选手、保密等复杂概念相对应的独立特征。他们将这样的实验进行了三次,分别来识别Sonnet LLM的一百万、四百万和三千四百万个特征。

The result is a sort of mind-map of the LLM, showing a small fraction of the concepts it has learned about from its training data. Places in the San Francisco Bay Area that are close geographically are also 'close' to each other in the concept space, as are related concepts, such as diseases or emotions. 'This is exciting because we have a partial conceptual map, a hazy one, of what's happening,' says Dr. Batson. 'And that's the starting point - we can enrich that map and branch out from there.'

这个结果是LLM的一种脑图,展现出大模型基于训练数据学习形成的小型概念分支。旧金山湾区在地理上接近的地点,在抽象概念的空间中也相互接近,例如各种疾病或各种情绪等相关的概念也会在概念空间中相互接近。Batson博士说,“我们对于在发生的事情,有了一张局部的模糊的概念地图,我们可以从这个起点上不断丰富和延伸这张地图。”

Focus the mind | 聚焦于意识

As well as seeing parts of the LLM light up, as it were, in response to specific concepts, it is also possible to change its behaviour by manipulating individual features. Anthropic tested this idea by 'spiking' (i.e., turning up) a feature associated with the Golden Gate Bridge. The result was a version of Claude that was obsessed with the bridge, and mentioned it at any opportunity. When asked how to spend $10, for example, it suggested paying the toll and driving over the bridge; when asked to write a love story, it made up one about a lovelorn car that could not wait to cross it.

既然我们可以看到LLM对应具体概念被激活的区域,就有可能通过操控独立的特征进而改变它的行为。Anthropic小组试验了一个想法,通过“峰值”(提高特征值)与金门大桥相关的特征来进行测试。结果大模型Claude的输出完全被这座桥所占据,它会尽一切可能提及这座大桥。例如,当被问及如何划掉10美元时,Claude建议开车通过大桥并支付过路费;当被要求写出一个爱情故事,它编出了一辆失恋之车迫不及待要通过大桥的情节。

That may sound silly, but the same principle could be used to discourage the model from talking about particular topics, such as bioweapons production. 'AI safety is a major goal here,' says Dr. Batson. It can also be applied to behaviors. By tuning specific features, models could be made more or less sycophantic, empathetic, or deceptive. Might a feature emerge that corresponds to the tendency to hallucinate? 'We didn't find a smoking gun,' says Dr. Batson. Whether hallucinations have an identifiable mechanism or signature is, he says, a 'million-dollar question'. And it is one addressed, by another group of researchers, in a new paper in Nature.

这也许听起来有些傻,但同样的原则也可以用来减少模型谈到某些特定的话题,例如化学武器的生产。Batson博士说:“人工智能的安全性是一个主要的目标”。这种方法也可以被应用于对行为的操控。通过调整某些特征值,可以提高或者降低大模型表现出来的虚伪迎合、同情心或者欺骗伪装等行为的程度。也许有一个特征值是对应于产生幻觉的程度?Batson博士说:”我们没有找到相关的实验证据”。另一位在《自然》杂志上发表最新论文的另一组研究员说,是否幻觉有一个可被识别的运行机制或者信号,这是个价值百万的问题。

Sebastian Farquhar and colleagues at the University of Oxford used a measure called 'semantic entropy' to assess whether a statement from an LLM is likely to be a hallucination or not. Their technique is quite straightforward: essentially, an LLM is given the same prompt several times, and its answers are then clustered by 'semantic similarity' (i.e., according to their meaning). The researchers' hunch was that the 'entropy' of these answers - in other words, the degree of inconsistency - corresponds to the LLM's uncertainty, and thus the likelihood of hallucination. If all its answers are essentially variations on a theme, they are probably not hallucinations (though they may still be incorrect).

牛津大学的Sebastian Farquhar和同事们用一种被称作“语义熵”的测量方法来评估LLM给出的描述是否是幻觉。他们的技术十分直接:本质上,对LLM重复输入相同词语很多次,将它的输出结果根据“语义相似性”(根据词语的含义)进行聚类。研究人员的直觉判断是这些答案的“熵”,也就是说,不一致的程度,这反映了LLM的不确定性,这反映了产生幻觉的可能性。如果大模型给出的所有答案本质上都是同一个主题的不同衍变,这不大可能是个幻觉(尽管答案有可能是错误的)。

In one example, the Oxford group asked an LLM which country is associated with fado music, and it consistently replied that fado is the national music of Portugal - which is correct, and not a hallucination. But when asked about the function of a protein called StarDio, the model gave several wildly different answers, which suggests hallucination. (The researchers prefer the term 'confabulation,' a subset of hallucinations they define as 'arbitrary and incorrect generations.') Overall, this approach was able to distinguish between accurate statements and hallucinations 79% of the time; ten percentage points better than previous methods. This work is complementary, in many ways, to Anthropic's.

例如,牛津大学的小组问LLM哪个国家与Fado音乐更相关,它一直回答Fado是葡萄牙的民族音乐,这就是个正确答案而不是幻觉。而当问到蛋白质StarD10的功能时,大模型给出了很多宽泛不同的答案,这应该就是幻觉了。(研究员更倾向于使用“失忆症”这个术语,他们将这个幻想的子集定义为“武断和错误的产生”)。总之,这种方法可以在区分正确输出和幻觉时达到79%的准确性,准确性比以往的方法提高了10%。在很多方面,这种方法和Anthropic的方法形成了很好的互补。

Others have also been lifting the lid on LLMs: the 'superalignment' team at OpenAI, maker of GPT-4 and ChatGPT, released its own paper on sparse autoencoders in June, though the team has now been dissolved after several researchers left the firm. But the OpenAI paper contained some innovative ideas, says Dr. Batson. 'We are really happy to see groups all over, working to understand models better,' he says. 'We want everybody doing it.'

很多人也在努力揭开LLMs神秘的面纱:OpenAI(GPT-4和ChatGPT的开发者)的“超一致”小组在七月发表了关于稀疏自动解码器的论文,尽管这个小组已经在多名研究员从公司离职后而解散。Batson博士说,这篇OpenAI的论文包含很多创新的方法。“看到有很多组织都在全心致力于更好地理解大模型,我们真的非常开心,也希望所有人都能参与进来。”

*本文翻译自《经济学人》2024年7月13日商业文章《Inside the mind of AI》,仅供英文交流学习使用,原图文版权归经济学人杂志所有



金科丛林
聚焦国际前沿研究,经济思想应用,行业发展动态,政策法规洞察,学研信息共享,学者领袖沟通。共推数字化,大数据,人工智能,Web3等在数字经济,科技金融,普惠可续领域的知识积累和创新应用。(康奈尔大学丛林教授数济金科实验室)
 最新文章