构建神经网络的新方法可以使人工智能更易于理解

文摘 2024-08-31 09:10 北京

这种简化的方法让我们更容易地了解神经网络如何产生输出。

经过阿尼尔·阿南塔斯瓦米

2024 年 8 月 30 日

STEPHANIE ARNETT/《麻省理工科技评论》| ENVATO

对神经网络中人工神经元的工作方式进行调整可以使人工智能更容易被解读。

人工神经元是深度神经网络的基本组成部分，几十年来几乎没有变化。虽然这些网络为现代人工智能提供了强大的功能，但它们也令人难以捉摸。

现有的人工神经元用于 GPT4 等大型语言模型，其工作原理是接收大量输入，将它们相加，然后使用神经元内部的另一个数学运算将总和转换为输出。这些神经元的组合构成了神经网络，它们的组合工作可能难以解码。

但这种组合神经元的新方法略有不同。现有神经元的一些复杂性被简化，并转移到神经元之外。在神经元内部，新神经元只需将其输入相加并产生输出，而无需额外的隐藏操作。这种神经元的网络被称为柯尔莫哥洛夫-阿诺德网络 (KAN)，以启发它们的俄罗斯数学家命名。

麻省理工学院研究人员领导的团队详细研究了这种简化方法，这种方法可以让我们更容易理解神经网络产生特定输出的原因，帮助验证其决策，甚至探测偏差。初步证据还表明，随着 KAN 变得更大，其准确率的提高速度比由传统神经元构成的网络更快。

“这是一项有趣的工作，”纽约大学研究机器学习基础的 Andrew Wilson表示。“人们试图从根本上重新思考这些 [网络] 的设计，这很好。”

KAN 的基本要素实际上是在 20 世纪 90 年代提出的，研究人员一直在构建此类网络的简单版本。但麻省理工学院领导的团队进一步推进了这一想法，展示了如何构建和训练更大的 KAN，对它们进行实证测试，并分析了一些 KAN，以证明人类可以如何解释它们的解决问题的能力。“我们重新激发了这个想法，”团队成员、麻省理工学院 Max Tegmark 实验室的博士生Ziming Liu说。“并且，希望随着可解释性的出现......我们 [可能] 不再 [必须] 认为神经网络是黑匣子。”

虽然还处于早期阶段，但该团队在 KAN 方面的工作已引起人们的关注。GitHub页面已纷纷涌现，展示了如何将 KAN 用于各种应用，例如图像识别和解决流体动力学问题。

寻找公式

当刘与麻省理工学院、加州理工学院和其他研究所的同事试图了解标准人工神经网络的内部工作原理时，取得了当前的进展。

如今，几乎所有类型的人工智能，包括用于构建大型语言模型和图像识别系统的人工智能，都包含称为多层感知器 (MLP) 的子网络。在 MLP 中，人工神经元排列成密集、相互连接的“层”。每个神经元内部都有一个称为“激活函数”的东西——一种数学运算，它接收一堆输入并以某种预先指定的方式将它们转换为输出。

在 MLP 中，每个人工神经元都会接收来自上一层所有神经元的输入，并将每个输入与相应的“权重”（表示该输入重要性的数字）相乘。这些加权输入被加在一起，并馈送到神经元内部的激活函数以生成输出，然后将其传递给下一层中的神经元。例如，MLP 通过为所有神经元的输入选择正确的权重值来学习区分猫和狗的图像。至关重要的是，激活函数是固定的，在训练期间不会改变。

经过训练后，MLP 的所有神经元及其连接加在一起，本质上就像另一个函数，接受输入（例如，图像中的数万个像素）并产生所需的输出（例如，0 表示猫，1 表示狗）。了解该函数是什么样子，也就是它的数学形式，是理解它为什么会产生某些输出的重要部分。例如，为什么它会根据某人的财务状况将其标记为有信誉？但 MLP 是黑匣子。对于图像识别等复杂任务，对网络进行逆向工程几乎是不可能的。

甚至当刘和他的同事尝试对 MLP 进行逆向工程以执行涉及定制“合成”数据的简单任务时，他们也遇到了困难。

“如果我们甚至无法解释来自神经网络的这些合成数据集，那么处理真实世界的数据集就毫无希望，”刘说。“我们发现很难理解这些神经网络。我们想改变架构。”

绘制数学图

主要的变化是删除固定的激活函数并引入一个更简单的可学习函数来在每个输入进入神经元之前对其进行转换。

与 MLP 神经元中的激活函数接收大量输入不同，KAN 神经元外部的每个简单函数接收一个数字并输出另一个数字。现在，在训练过程中，KAN 不会像 MLP 中那样学习单个权重，而是学习如何表示每个简单函数。在今年发表在预印本服务器 ArXiv 上的一篇论文中，刘和同事们表明，这些神经元外部的简单函数更容易解释，从而可以重建整个 KAN 正在学习的函数的数学形式。

然而，该团队仅在简单的合成数据集上测试了 KAN 的可解释性，而没有在更为复杂的现实问题（例如图像识别）上进行测试。“[我们] 正在慢慢突破界限，”刘说。“可解释性可能是一项非常具有挑战性的任务。”

Liu 及其同事还表明，随着规模的扩大，KAN 比 MLP 更快地完成任务并变得更加准确。该团队从理论上证明了这一结果，并在科学相关任务（例如学习近似与物理相关的函数）中进行了实证研究。“目前尚不清楚这一观察结果是否会扩展到标准机器学习任务，但至少对于科学相关任务而言，它似乎很有前景，”Liu 说。

刘承认 KAN 有一个重要的缺点：与 MLP 相比，训练 KAN 需要更多的时间和计算能力。

中国苏州西交利物浦大学的张迪表示：“这限制了 KAN 在大规模数据集和复杂任务上的应用效率。”但他建议，更高效的算法和硬件加速器可能会有所帮助。

Anil Ananthaswamy 是一名科学记者和作家，撰写有关物理学、计算神经科学和机器学习的文章。他的新书《机器为何学习：现代人工智能背后的优雅数学》于 7 月由 Dutton（美国企鹅兰登书屋）出版。

A new way to build neural networks could make AI more understandable

The simplified approach makes it easier to see how neural networks produce the outputs they do.

By Anil Ananthaswamy

August 30, 2024

上传失败，网络异常。

重试

STEPHANIE ARNETT/ MIT TECHNOLOGY REVIEW | ENVATO

A tweak to the way artificial neurons work in neural networks could make AIs easier to decipher.

Artificial neurons—the fundamental building blocks of deep neural networks—have survived almost unchanged for decades. While these networks give modern artificial intelligence its power, they are also inscrutable.

Existing artificial neurons, used in large language models like GPT4, work by taking in a large number of inputs, adding them together, and converting the sum into an output using another mathematical operation inside the neuron. Combinations of such neurons make up neural networks, and their combined workings can be difficult to decode.

But the new way to combine neurons works a little differently. Some of the complexity of the existing neurons is both simplified and moved outside the neurons. Inside, the new neurons simply sum up their inputs and produce an output, without the need for the extra hidden operation. Networks of such neurons are called Kolmogorov-Arnold Networks (KANs), after the Russian mathematicians who inspired them.

The simplification, studied in detail by a group led by researchers at MIT, could make it easier to understand why neural networks produce certain outputs, help verify their decisions, and even probe for bias. Preliminary evidence also suggests that as KANs are made bigger, their accuracy increases faster than networks built of traditional neurons.

“It’s interesting work,” says Andrew Wilson, who studies the foundations of machine learning at New York University. “It’s nice that people are trying to fundamentally rethink the design of these [networks].”

The basic elements of KANs were actually proposed in the 1990s, and researchers kept building simple versions of such networks. But the MIT-led team has taken the idea further, showing how to build and train bigger KANs, performing empirical tests on them, and analyzing some KANs to demonstrate how their problem-solving ability could be interpreted by humans. “We revitalized this idea,” said team member Ziming Liu, a PhD student in Max Tegmark’s lab at MIT. “And, hopefully, with the interpretability… we [may] no longer [have to] think neural networks are black boxes.”

While it’s still early days, the team’s work on KANs is attracting attention. GitHub pages have sprung up that show how to use KANs for myriad applications, such as image recognition and solving fluid dynamics problems.

Finding the formula

The current advance came when Liu and colleagues at MIT, Caltech, and other institutes were trying to understand the inner workings of standard artificial neural networks.

Today, almost all types of AI, including those used to build large language models and image recognition systems, include sub-networks known as a multilayer perceptron (MLP). In an MLP, artificial neurons are arranged in dense, interconnected “layers.” Each neuron has within it something called an “activation function”—a mathematical operation that takes in a bunch of inputs and transforms them in some pre-specified manner into an output.

In an MLP, each artificial neuron receives inputs from all the neurons in the previous layer and multiplies each input with a corresponding “weight” (a number signifying the importance of that input). These weighted inputs are added together and fed to the activation function inside the neuron to generate an output, which is then passed on to neurons in the next layer. An MLP learns to distinguish between images of cats and dogs, for example, by choosing the correct values for the weights of the inputs for all the neurons. Crucially, the activation function is fixed and doesn’t change during training.

Once trained, all the neurons of an MLP and their connections taken together essentially act as another function that takes an input (say, tens of thousands of pixels in an image) and produces the desired output (say, 0 for cat and 1 for dog). Understanding what that function looks like, meaning its mathematical form, is an important part of being able to understand why it produces some output. For example, why does it tag someone as creditworthy given inputs about their financial status? But MLPs are black boxes. Reverse-engineering the network is nearly impossible for complex tasks such as image recognition.

And even when Liu and colleagues tried to reverse-engineer an MLP for simpler tasks that involved bespoke “synthetic” data, they struggled.

“If we cannot even interpret these synthetic datasets from neural networks, then it’s hopeless to deal with real-world data sets,” says Liu. “We found it really hard to try to understand these neural networks. We wanted to change the architecture.”

Mapping the math

The main change was to remove the fixed activation function and introduce a much simpler learnable function to transform each incoming input before it enters the neuron.

Unlike the activation function in an MLP neuron, which takes in numerous inputs, each simple function outside the KAN neuron takes in one number and spits out another number. Now, during training, instead of learning the individual weights, as happens in an MLP, the KAN just learns how to represent each simple function. In a paper posted this year on the preprint server ArXiv, Liu and colleagues showed that these simple functions outside the neurons are much easier to interpret, making it possible to reconstruct the mathematical form of the function being learned by the entire KAN.

The team, however, has only tested the interpretability of KANs on simple, synthetic data sets, not on real-world problems, such as image recognition, which are more complicated. “[We are] slowly pushing the boundary,” says Liu. “Interpretability can be a very challenging task.”

Liu and colleagues have also shown that KANs get more accurate at their tasks with increasing size faster than MLPs do. The team proved the result theoretically and showed it empirically for science-related tasks (such as learning to approximate functions relevant to physics). “It’s still unclear whether this observation will extend to standard machine learning tasks, but at least for science-related tasks, it seems promising,” Liu says.

Liu acknowledges that KANs come with one important downside: it takes more time and compute power to train a KAN, compared to an MLP.

“This limits the application efficiency of KANs on large-scale data sets and complex tasks,” says Di Zhang, of Xi’an Jiaotong-Liverpool University in Suzhou, China. But he suggests that more efficient algorithms and hardware accelerators could help.

Anil Ananthaswamy is a science journalist and author who writes about physics, computational neuroscience, and machine learning. His new book, WHY MACHINES LEARN: The Elegant Math Behind Modern AI, was published by Dutton (Penguin Random House US) in July.

http://mp.weixin.qq.com/s?__biz=Mzg5OTY0MTc4MA==&mid=2247495836&idx=1&sn=c9b9bacac5e44c52fb51174d70f686ca

科技世代千高原

透视深度科技化时代™ 探寻合意的人类未来

最新文章

致命且迫在眉睫：五角大楼疯狂追逐硅谷人工智能武器

Shane Legg与Mira Lane的对话｜是时候考虑具有超级智能的AI了

经典文存：加州意识形态（1995）

机械化思维：人工智能对人类思维的隐藏影响

我们可持续的未来始于矿井吗？｜可再生能源转型带来的稀土开发与环保难题

超越真实性：汉娜·阿伦特在她最后未完成的作品中对我们寻找真实自我的想法进行了尖锐的批判

Melanie Mitchell：人工智能的隐喻

泰格马克｜如果通用人工智能在特朗普下届任期内到来，其他事情都不重要了

关于通用人工智能曼哈顿工程的动议：未提出和未回答的问题

通用人工智能曼哈顿工程来了？对华鹰派正在制造人工智能军备竞赛

神经多样性的未来：这场运动取得了重要进展，但只关注权利和代表性会使太多人跟不上

我们需要原始的敬畏：在这个科技泛滥的时代，屏幕生活让我们无法体验生命的奥秘和变革的奇迹

Garrison Lovely｜是否存在一条通往 AGI 的“基本清晰”的道路？大模型真的遇到瓶颈了吗？

系统0来了｜人工智能正在改变我们的思维方式

加里·F·马库斯的AI愿望清单和困境｜“技术批评”能够驯服硅谷吗？

有意义且成功的伦理制定：来自审议智慧理论的提议

阿伦特式的新开端｜与AI的七问七答：如果在纽约时代广场建造一座失败的建筑，它可以设计成什么风格？

梅兰妮·米切尔｜关于人工智能大模型推理能力的辩论升温

用AI写了篇小论文｜《人类世与能源转型的哲学思考》｜虽然前段时间地质学家开会决定放弃人类世这个说法了

量子技术先驱约瑟夫森的“诺贝尔病”：星际之门大杀器、拯救物理学的嬉皮士与非正统科学之坑

从人工智能教育应用看神经技术的伦理挑战

走向AGI之路：从技术路线、能力分级到共同进化（一）

关注脆弱性应作为未来人工智能治理的关键策略

人工智能全球治理要正AGI现实主义的挑战

“它是我们期待的机器人——像 C3PO 一样”：为何人形机器人还没有出现在我们的家中？

人工智能模型是否比研究人员产生更多原创想法？

隐藏模式揭示诺贝尔奖科学趋势

重返月球为何如此困难？

根据人工智能，伟大的哲学家们会对人工智能说些什么？

如何在人工智能时代变得无可替代

量子柴郡猫思想实验

对 OpenAI o1 的第一印象：一款被设计用来过度思考的人工智能

用人工智能写作：专业作家利用 ChatGPT 的五种方式

OpenAI 宣布推出代号为 Strawberry 的全新 AI 模型，可逐步解决难题

人工智能比人类专家产生更多新颖、令人兴奋的研究想法

GPT 已死，GPT 万岁

顶级大型语言模型 (LLM)：人工智能巨头在 13 项指标上的综合排名，包括多任务推理、编码、数学、延迟、零样本和小样本学习等

美国、英国和欧盟签署欧洲理事会高级别人工智能安全条约

互联网档案馆在重大版权案件中败诉

人工智能正在向业余小说家进军，只要好，谁写的还不是一样

生成式人工智能改变了英语作业，接下来是数学

ChatGPT 擅长总结书籍，但人工智能会写出真正的文学作品吗？

自动驾驶汽车Waymo 在更多城市上路

深度伪造和学生的深度学习：科学界的和谐组合？

美国会将就《生物安全法案》等针对中企措施进行表决

从还原论到动力系统：两本书如何影响了我 30 年对神经科学的思考

研究：用于训练大型语言模型的数据集通常缺乏透明度

T哥之胡乱联系与培根之六度空间

构建神经网络的新方法可以使人工智能更易于理解

研究人员打造了一款“人工智能科学家”——它能做什么？

分类

时事

民生

政务

教育

文化

科技

财富

体娱

健康

情感

旅行

百科

职场

楼市

企业

乐活

学术

汽车

时尚

创业

美食

幽默

美体

文摘

原创标签

时事社会财经军事教育体育科技汽车科学房产搞笑综艺明星音乐动漫游戏时尚健康旅游美食生活摄影宠物职场育儿情感小说曲艺文化历史三农文学娱乐电影视频图片新闻宗教电视剧纪录片广告创意壁纸头像心灵鸡汤星座命理教育培训艺术文化金融财经健康医疗美妆时尚餐饮美食母婴育儿社会新闻工业农业时事政治星座占卜幽默笑话独立短篇连载作品文化历史科技互联网

发布位置

广东北京山东江苏河南浙江山西福建河北上海四川陕西湖南安徽湖北内蒙古江西云南广西甘肃辽宁黑龙江贵州新疆重庆吉林天津海南青海宁夏西藏香港澳门台湾美国加拿大澳大利亚日本新加坡英国西班牙新西兰韩国泰国法国德国意大利缅甸菲律宾马来西亚越南荷兰柬埔寨俄罗斯巴西智利卢森堡芬兰瑞典比利时瑞士土耳其斐济挪威朝鲜尼日利亚阿根廷匈牙利爱尔兰印度老挝葡萄牙乌克兰印度尼西亚哈萨克斯坦塔吉克斯坦希腊南非蒙古奥地利肯尼亚加纳丹麦津巴布韦埃及坦桑尼亚捷克阿联酋安哥拉