构建神经网络的新方法可以使人工智能更易于理解

文摘   2024-08-31 09:10   北京  



这种简化的方法让我们更容易地了解神经网络如何产生输出。

经过 阿尼尔·阿南塔斯瓦米

对神经网络中人工神经元的工作方式进行调整可以使人工智能更容易被解读。

人工神经元是深度神经网络的基本组成部分,几十年来几乎没有变化。虽然这些网络为现代人工智能提供了强大的功能,但它们也令人难以捉摸。 

现有的人工神经元用于 GPT4 等大型语言模型,其工作原理是接收大量输入,将它们相加,然后使用神经元内部的另一个数学运算将总和转换为输出。这些神经元的组合构成了神经网络,它们的组合工作可能难以解码。

但这种组合神经元的新方法略有不同。现有神经元的一些复杂性被简化,并转移到神经元之外。在神经元内部,新神经元只需将其输入相加并产生输出,而无需额外的隐藏操作。这种神经元的网络被称为柯尔莫哥洛夫-阿诺德网络 (KAN),以启发它们的俄罗斯数学家命名。

麻省理工学院研究人员领导的团队详细研究了这种简化方法,这种方法可以让我们更容易理解神经网络产生特定输出的原因,帮助验证其决策,甚至探测偏差。初步证据还表明,随着 KAN 变得更大,其准确率的提高速度比由传统神经元构成的网络更快。

“这是一项有趣的工作,”纽约大学研究机器学习基础的 Andrew Wilson表示。“人们试图从根本上重新思考这些 [网络] 的设计,这很好。”

KAN 的基本要素实际上是在 20 世纪 90 年代提出的,研究人员一直在构建此类网络的简单版本。但麻省理工学院领导的团队进一步推进了这一想法,展示了如何构建和训练更大的 KAN,对它们进行实证测试,并分析了一些 KAN,以证明人类可以如何解释它们的解决问题的能力。“我们重新激发了这个想法,”团队成员、麻省理工学院 Max Tegmark 实验室的博士生Ziming Liu说。“并且,希望随着可解释性的出现......我们 [可能] 不再 [必须] 认为神经网络是黑匣子。”

虽然还处于早期阶段,但该团队在 KAN 方面的工作已引起人们的关注。GitHub页面已纷纷涌现,展示了如何将 KAN 用于各种应用,例如图像识别和解决流体动力学问题。 

寻找公式

当刘与麻省理工学院、加州理工学院和其他研究所的同事试图了解标准人工神经网络的内部工作原理时,取得了当前的进展。 

如今,几乎所有类型的人工智能,包括用于构建大型语言模型和图像识别系统的人工智能,都包含称为多层感知器 (MLP) 的子网络。在 MLP 中,人工神经元排列成密集、相互连接的“层”。每个神经元内部都有一个称为“激活函数”的东西——一种数学运算,它接收一堆输入并以某种预先指定的方式将它们转换为输出。 

在 MLP 中,每个人工神经元都会接收来自上一层所有神经元的输入,并将每个输入与相应的“权重”(表示该输入重要性的数字)相乘。这些加权输入被加在一起,并馈送到神经元内部的激活函数以生成输出,然后将其传递给下一层中的神经元。例如,MLP 通过为所有神经元的输入选择正确的权重值来学习区分猫和狗的图像。至关重要的是,激活函数是固定的,在训练期间不会改变。

经过训练后,MLP 的所有神经元及其连接加在一起,本质上就像另一个函数,接受输入(例如,图像中的数万个像素)并产生所需的输出(例如,0 表示猫,1 表示狗)。了解该函数是什么样子,也就是它的数学形式,是理解它为什么会产生某些输出的重要部分。例如,为什么它会根据某人的财务状况将其标记为有信誉?但 MLP 是黑匣子。对于图像识别等复杂任务,对网络进行逆向工程几乎是不可能的。

甚至当刘和他的同事尝试对 MLP 进行逆向工程以执行涉及定制“合成”数据的简单任务时,他们也遇到了困难。 

“如果我们甚至无法解释来自神经网络的这些合成数据集,那么处理真实世界的数据集就毫无希望,”刘说。“我们发现很难理解这些神经网络。我们想改变架构。”

绘制数学图

主要的变化是删除固定的激活函数并引入一个更简单的可学习函数来在每个输入进入神经元之前对其进行转换。 

与 MLP 神经元中的激活函数接收大量输入不同,KAN 神经元外部的每个简单函数接收一个数字并输出另一个数字。现在,在训练过程中,KAN 不会像 MLP 中那样学习单个权重,而是学习如何表示每个简单函数。在今年发表在预印本服务器 ArXiv 上的一篇论文中,刘和同事们表明,这些神经元外部的简单函数更容易解释,从而可以重建整个 KAN 正在学习的函数的数学形式。

然而,该团队仅在简单的合成数据集上测试了 KAN 的可解释性,而没有在更为复杂的现实问题(例如图像识别)上进行测试。“[我们] 正在慢慢突破界限,”刘说。“可解释性可能是一项非常具有挑战性的任务。”

Liu 及其同事还表明,随着规模的扩大,KAN 比 MLP 更快地完成任务并变得更加准确。该团队从理论上证明了这一结果,并在科学相关任务(例如学习近似与物理相关的函数)中进行了实证研究。“目前尚不清楚这一观察结果是否会扩展到标准机器学习任务,但至少对于科学相关任务而言,它似乎很有前景,”Liu 说。

刘承认 KAN 有一个重要的缺点:与 MLP 相比,训练 KAN 需要更多的时间和计算能力。

中国苏州西交利物浦大学的 张迪表示:“这限制了 KAN 在大规模数据集和复杂任务上的应用效率。”但他建议,更高效的算法和硬件加速器可能会有所帮助。

Anil Ananthaswamy 是一名科学记者和作家,撰写有关物理学、计算神经科学和机器学习的文章。他的新书《机器为何学习:现代人工智能背后的优雅数学》于 7 月由 Dutton(美国企鹅兰登书屋)出版。

A new way to build neural networks could make AI more understandable

The simplified approach makes it easier to see how neural networks produce the outputs they do.

By Anil Ananthaswamy

A tweak to the way artificial neurons work in neural networks could make AIs easier to decipher.

Artificial neurons—the fundamental building blocks of deep neural networks—have survived almost unchanged for decades. While these networks give modern artificial intelligence its power, they are also inscrutable. 

Advertisement

Existing artificial neurons, used in large language models like GPT4, work by taking in a large number of inputs, adding them together, and converting the sum into an output using another mathematical operation inside the neuron. Combinations of such neurons make up neural networks, and their combined workings can be difficult to decode.

But the new way to combine neurons works a little differently. Some of the complexity of the existing neurons is both simplified and moved outside the neurons. Inside, the new neurons simply sum up their inputs and produce an output, without the need for the extra hidden operation. Networks of such neurons are called Kolmogorov-Arnold Networks (KANs), after the Russian mathematicians who inspired them.

The simplification, studied in detail by a group led by researchers at MIT, could make it easier to understand why neural networks produce certain outputs, help verify their decisions, and even probe for bias. Preliminary evidence also suggests that as KANs are made bigger, their accuracy increases faster than networks built of traditional neurons.

“It’s interesting work,” says Andrew Wilson, who studies the foundations of machine learning at New York University. “It’s nice that people are trying to fundamentally rethink the design of these [networks].”

The basic elements of KANs were actually proposed in the 1990s, and researchers kept building simple versions of such networks. But the MIT-led team has taken the idea further, showing how to build and train bigger KANs, performing empirical tests on them, and analyzing some KANs to demonstrate how their problem-solving ability could be interpreted by humans. “We revitalized this idea,” said team member Ziming Liu, a PhD student in Max Tegmark’s lab at MIT. “And, hopefully, with the interpretability… we [may] no longer [have to] think neural networks are black boxes.”

While it’s still early days, the team’s work on KANs is attracting attention. GitHub pages have sprung up that show how to use KANs for myriad applications, such as image recognition and solving fluid dynamics problems. 

Finding the formula

The current advance came when Liu and colleagues at MIT, Caltech, and other institutes were trying to understand the inner workings of standard artificial neural networks. 

Today, almost all types of AI, including those used to build large language models and image recognition systems, include sub-networks known as a multilayer perceptron (MLP). In an MLP, artificial neurons are arranged in dense, interconnected “layers.” Each neuron has within it something called an “activation function”—a mathematical operation that takes in a bunch of inputs and transforms them in some pre-specified manner into an output. 

Advertisement

In an MLP, each artificial neuron receives inputs from all the neurons in the previous layer and multiplies each input with a corresponding “weight” (a number signifying the importance of that input). These weighted inputs are added together and fed to the activation function inside the neuron to generate an output, which is then passed on to neurons in the next layer. An MLP learns to distinguish between images of cats and dogs, for example, by choosing the correct values for the weights of the inputs for all the neurons. Crucially, the activation function is fixed and doesn’t change during training.

Once trained, all the neurons of an MLP and their connections taken together essentially act as another function that takes an input (say, tens of thousands of pixels in an image) and produces the desired output (say, 0 for cat and 1 for dog). Understanding what that function looks like, meaning its mathematical form, is an important part of being able to understand why it produces some output. For example, why does it tag someone as creditworthy given inputs about their financial status? But MLPs are black boxes. Reverse-engineering the network is nearly impossible for complex tasks such as image recognition.

And even when Liu and colleagues tried to reverse-engineer an MLP for simpler tasks that involved bespoke “synthetic” data, they struggled. 

“If we cannot even interpret these synthetic datasets from neural networks, then it’s hopeless to deal with real-world data sets,” says Liu. “We found it really hard to try to understand these neural networks. We wanted to change the architecture.”

Mapping the math

The main change was to remove the fixed activation function and introduce a much simpler learnable function to transform each incoming input before it enters the neuron. 

Unlike the activation function in an MLP neuron, which takes in numerous inputs, each simple function outside the KAN neuron takes in one number and spits out another number. Now, during training, instead of learning the individual weights, as happens in an MLP, the KAN just learns how to represent each simple function. In a paper posted this year on the preprint server ArXiv, Liu and colleagues showed that these simple functions outside the neurons are much easier to interpret, making it possible to reconstruct the mathematical form of the function being learned by the entire KAN.

The team, however, has only tested the interpretability of KANs on simple, synthetic data sets, not on real-world problems, such as image recognition, which are more complicated. “[We are] slowly pushing the boundary,” says Liu. “Interpretability can be a very challenging task.”

Liu and colleagues have also shown that KANs get more accurate at their tasks with increasing size faster than MLPs do. The team proved the result theoretically and showed it empirically for science-related tasks (such as learning to approximate functions relevant to physics). “It’s still unclear whether this observation will extend to standard machine learning tasks, but at least for science-related tasks, it seems promising,” Liu says.

Advertisement

Liu acknowledges that KANs come with one important downside: it takes more time and compute power to train a KAN, compared to an MLP.

“This limits the application efficiency of KANs on large-scale data sets and complex tasks,” says Di Zhang, of Xi’an Jiaotong-Liverpool University in Suzhou, China. But he suggests that more efficient algorithms and hardware accelerators could help.

Anil Ananthaswamy is a science journalist and author who writes about physics, computational neuroscience, and machine learning. His new book, WHY MACHINES LEARN: The Elegant Math Behind Modern AI, was published by Dutton (Penguin Random House US) in July.


科技世代千高原
透视深度科技化时代™ 探寻合意的人类未来
 最新文章