John J. Hopfield 和 Geoffrey E. Hinton 因通过神经网络对现代机器学习作出的奠基性贡献,获得 2024 年诺贝尔物理学奖!
作为 2024 诺贝尔物理学奖得主,John Hopfield 解决了人工神经网络的记忆和数据存储问题,而 Geoffrey Hinton 在 Hopfield Network 的基础之上, 解决了人工神经网络如何自主学习和识别数据中特征的问题。
因为诺贝尔没有数学奖更没有计算机科学奖,所以拿出了物理学奖这个最高荣颁给了足以改变人类未来的科学贡献 - 人工神经网络的 AI
杰弗里·辛顿(Geoffrey Hinton)也成为人类历史上首个获得图灵奖(2018 年)+诺贝尔奖(2024 年)的科学家
Hinton 简介
可能你对 Hinton 这个名字感到陌生
他是深度学习的泰斗
你稍微对 AI 有一点了解,肯定听过现在如日中天的 OpenAI 公司
OpenAI 公司的前首席科学家 Ilya Sutskever 就是 Hinton 的嫡传弟子、衣钵传人, Ilya 继承了 Hinton 的理念
Ilya 是个天才,他在暑期炸了两个月的薯条后,走进Hinton 多伦多大学的办公室,要求成为他的学生。关于两人的故事,参看量子位这篇文章《Hinton 揭秘 Ilya 成长历程:Scaling Law 是他学生时代就有的直觉》
以下是 Hinton 的徒子徒孙,也是 AI 群英谱
杰弗里·辛顿(Geoffrey Hinton)是一位在人工智能领域具有重要影响力的科学家,他因在神经网络和机器学习方面的贡献而闻名。辛顿出生于 1947 年 12 月 6 日,英国温布尔登人,是多伦多大学的名誉教授。
辛顿在人工智能领域有着深厚的学术背景,他在 1978 年获得了爱丁堡大学的人工智能博士学位。他的研究主要集中在神经网络、机器学习、分类监督学习等领域,并且他是反向传播算法的提出者之一。此外,他还提出了前向-前向算法(Forward-Forward algorithm),这是一种新的深度学习算法,旨在替代传统的反向传播训练方法。
辛顿因其在人工智能领域的杰出贡献,获得了多个重要奖项。2018 年,他与 Yann LeCun 共同获得了图灵奖,这是计算机科学领域最高荣誉之一。此外,他在 2024 年还与 John J. Hopfield 一起荣获诺贝尔物理学奖,以表彰他们在利用人工神经网络实现机器学习方面的基础性发现和发明。
辛顿不仅在学术界有着卓越的成就,他还致力于教育和普及人工智能知识。他曾开设了面向机器学习的神经网络公开课,并在 Coursera 平台上进行教学。他的课程深入介绍了神经网络在语音识别、物体识别、图像分割和语言建模等过程中的应用。
杰弗里·辛顿是人工智能领域的重要人物,他的工作不仅推动了机器学习和深度学习的发展,也为相关领域的研究者和学生提供了宝贵的教育资源。
诺贝尔官网介绍
Hinton 被引用最多的论文:
https://www.openread.academy/en/paper/reading?corpusId=784288
诺贝尔官网介绍链接:
https://www.openread.academy/zh/paper/reading?corpusId=195908774
译文:
2024 年诺贝尔物理学奖
今年的获奖者运用了物理学的工具,构建了推动当前强大机器学习发展的基础性方法。John Hopfield 创建了一种能够存储和重现信息的结构。Geoffrey Hinton 则发明了一种可以自主发现数据特征的方法,这项发明如今已经成为大型人工神经网络的关键技术之一。
他们利用物理学发现信息中的模式
© Johan Jarnestad/The Royal Swedish Academy of Sciences
许多人已经见证了计算机可以在不同语言间进行翻译、解读图像,甚至参与合理的对话。但鲜为人知的是,这类技术很早就在研究领域中得到了广泛应用,尤其在海量数据的排序和分析方面。在过去十五到二十年间,机器学习的迅速发展依赖于一种称为人工神经网络的结构。现如今,当我们谈到人工智能时,这种技术通常就是我们所指的对象。
尽管计算机不能思考,但机器现在可以模拟人类的记忆与学习功能。今年的诺贝尔物理学奖得主们正是通过物理学的基本概念与方法,开发了利用网络结构来处理信息的技术,促成了这一可能性。
与传统软件不同,传统软件的工作方式就像烹饪食谱一样:输入数据,按照明确的步骤进行处理,最终产出结果,比如按步骤做出蛋糕。而在机器学习中,计算机通过观察实例进行学习,这使得它能够处理那些模糊且复杂的任务,无法依靠逐步指令解决的难题。一个例子是让计算机解释一张图片并识别其中的物体。
模仿大脑
人工神经网络通过整个网络结构来处理信息。这种技术的灵感最初源于对大脑工作原理的探索。早在 1940 年代,研究人员便开始探讨大脑中神经元和突触网络背后的数学原理。而另一个关键来自心理学,神经科学家 Donald Hebb 提出了关于学习如何发生的假设:当神经元共同工作时,它们之间的连接会增强。
随着这些理论的发展,科学家开始尝试通过计算机模拟构建人工神经网络,以复制大脑网络的运作。在这些网络中,神经元由赋予不同数值的节点模拟,而突触则由节点之间的连接表示,这些连接可以随着“训练”过程的进行而变强或减弱。Donald Hebb 的假设至今仍然是人工神经网络训练的基本规则之一。
© Johan Jarnestad/The Royal Swedish Academy of Sciences
到了 1960 年代末期,一些理论上的负面结果使许多研究人员对神经网络的前景产生了怀疑,认为它们可能永远无法实现实际应用。然而,在 1980 年代,随着几项关键思想的提出,人工神经网络的研究重新受到关注,其中就包括今年诺贝尔奖得主的工作。
联想记忆
试想一下,你正努力回忆一个平时不常用的词语,比如那种电影院或讲堂常见的斜坡。你在脑海中搜寻着,想到的词可能是 ramp... 还是 rad...ial?不对,不是这个。哦,对了,是 rake!
这种在相似词汇中搜寻正确词的过程,类似于物理学家 John Hopfield 在 1982 年发现的联想记忆。Hopfield 网络能够存储不同的模式,并通过一种方法将它们重新找回。当网络接收到一个不完整或稍微变形的模式时,这个方法能找出最接近的存储模式。
Hopfield 曾利用他在物理学的背景,研究分子生物学中的理论问题。当他被邀请参加一场神经科学会议时,他接触到了有关大脑结构的研究。这引起了他的极大兴趣,促使他开始思考简单神经网络的动态行为。当神经元共同工作时,它们能够产生新的、强大的特性,这些特性无法通过单独研究神经网络的单个组件来发现。
1980 年,Hopfield 离开了普林斯顿大学,因为他的研究兴趣已经超出了物理学领域的常规范畴。他接受了南加州帕萨迪纳加州理工学院 (Caltech) 的化学和生物学教授职位。在那里,他可以自由使用计算资源,进行各种实验,发展他对神经网络的构想。
然而,他并没有放弃物理学的根基。他从物理学中获得了许多灵感,特别是那些关于许多小组件协同作用产生新现象的理论。他从磁性材料中得到了特别的启示,这些材料的特殊特性源于原子自旋——一种让每个原子都成为微小磁体的特性。邻近原子的自旋会相互影响,形成同方向的自旋区域。他利用描述自旋相互作用的物理学,成功建立了一个节点和连接构成的模型网络。
网络在一个结构中保存图像
Hopfield 创建的网络由多个节点构成,这些节点通过不同强度的连接相互连接。每个节点可以存储一个数值——在 Hopfield 的最初实验中,这个数值可以是 0 或 1,类似于黑白图像中的像素点。
Hopfield 使用一种类似于物理学中自旋系统能量的特性来描述整个网络的状态;网络的能量通过一个公式计算,公式使用所有节点的数值和节点之间连接的强度。
Hopfield 网络的编程是通过将图像输入到节点中进行的,节点会被赋予黑色 (0) 或白色 (1) 的值。接着,使用能量公式来调整网络的连接,使得保存的图像能够达到最低能量状态。当一个新模式输入网络时,系统会按照规则逐一检查每个节点,判断如果改变节点的数值,网络的能量是否会减少。如果改变黑色像素为白色能够降低能量,像素就会转换颜色。这一过程会持续进行,直到没有进一步的改进为止。当达到这种状态时,网络通常能够重现它被训练的原始图像。
如果只是保存一张图像,可能看起来没什么特别之处。你或许会想,为什么不直接保存图像,然后将其与新的图像进行比较呢?但 Hopfield 的方法特别之处在于它能够同时保存多张图片,并且网络通常能成功区分它们。
Hopfield 将在网络中寻找保存状态的过程形象地比喻为一个球在充满山峰和山谷的景观中滚动,滚动过程中摩擦力让它逐渐减速。如果球从某个位置被释放,它会滚向最近的山谷并停在那里。同样,当网络接收到一个与已保存模式接近的输入时,它也会不断调整,直到达到能量景观的最低点,找到记忆中最相似的模式。
Hopfield 网络可以用于重建受噪声干扰或部分丢失的数据。
插图 © Johan Jarnestad/The Royal Swedish Academy of Sciences
Hopfield 和其他研究人员进一步发展了 Hopfield 网络的运作机制,现在节点可以存储任何数值,而不仅仅是 0 和 1。如果把节点比作图像中的像素,它们可以拥有不同的颜色,而不仅仅是黑白两色。改进的方法使得网络能够保存更多的图片,即便这些图片非常相似,也能区分开来。只要信息是由多个数据点组成,网络就能识别或重建它。
使用十九世纪物理学进行分类
记住一张图片是一回事,但要理解它的含义则需要更深入的分析。
即便是很小的孩子也能够指着不同的动物,并自信地说出那是狗、猫或松鼠。尽管偶尔会出错,但不久之后他们几乎总是正确的。孩子们不需要通过图解或学习物种和哺乳动物等概念,也能够掌握这种分类能力。只要看到几种动物的例子,他们的头脑中就能自然地形成这些类别。人类通过接触周围的环境,学会识别猫、理解词汇,或是在进入房间时注意到某些变化。
当 Hopfield 发表他的联想记忆研究时,Geoffrey Hinton 正在美国匹兹堡的卡内基梅隆大学任职。他曾在英国和苏格兰学习实验心理学与人工智能,正在思索机器是否也能像人类一样,通过自主发现信息分类方式来处理和解读模式。与同事 Terrence Sejnowski 一起,Hinton 从 Hopfield 网络出发,并结合统计物理学的概念,开发出新的模型。
统计物理学用于描述由许多相似元素组成的系统,比如气体中的分子。虽然很难甚至不可能追踪每个分子的运动轨迹,但通过整体分析,可以得出气体的总体特性,如压力和温度。尽管气体分子可以以不同的速度扩散在空间中,但它们仍然能产生相同的整体性质。
通过统计物理学,我们可以分析这些分子组成的系统中的各种状态,并计算出它们的发生概率。一些状态比其他状态更有可能发生,这主要取决于可用能量的多少,这由 19 世纪物理学家 Ludwig Boltzmann 的方程描述。Hinton 的网络利用了这个方程,这种方法在 1985 年以“Boltzmann 机”这个引人注目的名字发表出来。
识别同类型的新样本
Boltzmann 机通常由两种不同的节点组成。一部分节点称为可见节点,用于接收输入信息;另一部分节点构成了隐藏层。这些隐藏节点的数值和它们之间的连接也会影响整个网络的能量状态。
该机器通过逐个更新节点数值的规则运行,最终机器会进入一个状态,此时节点的模式可以变化,但网络整体的属性保持不变。每一种可能的模式都有一个由网络能量根据 Boltzmann 方程确定的特定概率。当机器停止时,它生成了一个新的模式,这使得 Boltzmann 机成为早期生成模型的典型例子。
不同网络类型的插图
© Johan Jarnestad/The Royal Swedish Academy of Sciences
Boltzmann 机通过提供的示例进行学习,而不是依靠指令。它通过调整网络连接中的数值进行训练,使得训练时输入到可见节点的示例模式在机器运行时的出现概率最大化。如果在训练中多次重复某个模式,该模式的出现概率会进一步增加。训练还会影响生成类似示例的新模式的概率。
训练后的 Boltzmann 机可以识别出未曾见过的信息中的熟悉特征。就像你见到朋友的兄弟姐妹时,能立刻感觉到他们是亲属一样,Boltzmann 机也能识别出属于训练类别的新例子,并将其与不相似的材料区分开来。
原始版本的 Boltzmann 机效率较低,寻找解决方案需要很长时间。当进行多方面的优化后,它变得更有吸引力,Hinton 继续探索这些改进。后续版本精简了某些单元之间的连接,事实证明,这提高了机器的效率。
在 1990 年代,许多研究人员对人工神经网络逐渐失去了兴趣,但 Hinton 一直坚持继续这一领域的工作,并促成了新一轮的突破。在 2006 年,他与同事 Simon Osindero、Yee Whye Teh 和 Ruslan Salakhutdinov 共同开发了一种多层 Boltzmann 机的预训练方法。这种方法为网络提供了更好的初始状态,使得它在图像识别训练中更为高效。
Boltzmann 机经常作为更大网络的一部分使用。例如,它可以根据观众的偏好推荐电影或电视剧。
机器学习——现在与未来
John Hopfield 和 Geoffrey Hinton 自 1980 年代以来的研究,奠定了 2010 年左右开始的机器学习革命的基础。
如今,我们看到的技术进步,得益于能够使用海量数据训练网络以及计算能力的巨大提升。现代的人工神经网络规模庞大,通常由多层结构组成,称为深度神经网络,它们的训练方法被称为深度学习。
回顾一下 Hopfield 在 1982 年发表的联想记忆研究,可以让我们对这一发展有更好的理解。当时,他使用了一个包含 30 个节点的网络。如果所有节点相互连接,就有 435 条连接。每个节点有其数值,连接有不同的强度,总共不到 500 个参数需要管理。他还尝试了一个 100 个节点的网络,但由于当时计算机的限制,这个网络过于复杂。对比之下,今天的大型语言模型网络可能包含超过一万亿个参数(即一百万的百万)。
目前,许多研究人员正在探索机器学习的应用前景,哪些领域将最具生命力还需拭目以待。同时,关于这项技术的伦理问题也在广泛讨论。
物理学不仅为机器学习的进步提供了工具,反过来,物理学研究领域本身也从人工神经网络中受益。机器学习已在一些曾获得诺贝尔物理学奖的研究领域中得到应用,比如筛选和处理大量数据以发现希格斯粒子。其他应用还包括减少黑洞碰撞产生的引力波测量中的噪声,或用于寻找系外行星。
近年来,机器学习技术也开始用于计算和预测分子与材料的特性,比如计算决定蛋白质功能的分子结构,或研究哪些新型材料可能具备制造更高效太阳能电池的最佳特性。
进一步阅读 如果您想了解今年诺贝尔奖的更多信息,包括英文版的科学背景材料,请访问瑞典皇家科学院的网站 www.kva.se 或 www.nobelprize.org。在这些网站上,您可以观看新闻发布会、诺贝尔奖演讲等相关视频。有关诺贝尔奖及经济科学奖相关展览与活动的更多信息,请访问 www.nobelprizemuseum.se。
瑞典皇家科学院决定将 2024 年诺贝尔物理学奖授予:
JOHN J. HOPFIELD1933 年生于美国伊利诺伊州芝加哥。1958 年获美国纽约州康奈尔大学博士学位。现为美国普林斯顿大学教授。
GEOFFREY E. HINTON1947 年生于英国伦敦。1978 年获英国爱丁堡大学博士学位。现为加拿大多伦多大学教授。
“表彰他们在使人工神经网络实现机器学习的基础性发现和发明。”
科学编辑:Ulf Danielsson、Olle Eriksson、Anders Irbäck 和 Ellen Moons,诺贝尔物理学奖委员会
撰稿:Anna Davour
翻译:Clare Barnes
插图:Johan Jarnestad
编辑:Sara Gustavsson
诺贝尔官网通告原文:
The Nobel Prize in Physics 2024 This year’s laureates used tools from physics to construct methods that helped lay the foundation for today’s powerful machine learning. John Hopfield created a structure that can store and reconstruct information. Geoffrey Hinton invented a method that can independently discover properties in data and which has become important for the large artificial neural networks now in use.
They used physics to find patterns in information
Illustration
© Johan Jarnestad/The Royal Swedish Academy of Sciences Many people have experienced how computers can translate between languages, interpret images and even conduct reasonable conversations. What is perhaps less well known is that this type of technology has long been important for research, including the sorting and analysis of vast amounts of data. The development of machine learning has exploded over the past fifteen to twenty years and utilises a structure called an artificial neural network. Nowadays, when we talk about artificial intelligence, this is often the type of technology we mean.
Although computers cannot think, machines can now mimic functions such as memory and learning. This year’s laureates in physics have helped make this possible. Using fundamental concepts and methods from physics, they have developed technologies that use structures in networks to process information.
Machine learning differs from traditional software, which works like a type of recipe. The software receives data, which is processed according to a clear description and produces the results, much like when someone collects ingredients and processes them by following a recipe, producing a cake. Instead of this, in machine learning the computer learns by example, enabling it to tackle problems that are too vague and complicated to be managed by step by step instructions. One example is interpreting a picture to identify the objects in it.
Popular information
Popular science background:
They used physics to find patterns in information (pdf)
Populärvetenskaplig information:
De använde fysiken för att hitta mönster i information (pdf)
The Nobel Prize in Physics 2024
This year’s laureates used tools from physics to construct methods that helped lay the foundation for today’s powerful machine learning. John Hopfield created a structure that can store and reconstruct information. Geoffrey Hinton invented a method that can independently discover properties in data and which has become important for the large artificial neural networks now in use.
They used physics to find patterns in information
Many people have experienced how computers can translate between languages, interpret images and even conduct reasonable conversations. What is perhaps less well known is that this type of technology has long been important for research, including the sorting and analysis of vast amounts of data. The development of machine learning has exploded over the past fifteen to twenty years and utilises a structure called an artificial neural network. Nowadays, when we talk about artificial intelligence, this is often the type of technology we mean.
Although computers cannot think, machines can now mimic functions such as memory and learning. This year’s laureates in physics have helped make this possible. Using fundamental concepts and methods from physics, they have developed technologies that use structures in networks to process information.
Machine learning differs from traditional software, which works like a type of recipe. The software receives data, which is processed according to a clear description and produces the results, much like when someone collects ingredients and processes them by following a recipe, producing a cake. Instead of this, in machine learning the computer learns by example, enabling it to tackle problems that are too vague and complicated to be managed by step by step instructions. One example is interpreting a picture to identify the objects in it.
Mimics the brain
An artificial neural network processes information using the entire network structure. The inspiration initially came from the desire to understand how the brain works. In the 1940s, researchers had started to reason around the mathematics that underlies the brain’s network of neurons and synapses. Another piece of the puzzle came from psychology, thanks to neuroscientist Donald Hebb’s hypothesis about how learning occurs because connections between neurons are reinforced when they work together.
Later, these ideas were followed by attempts to recreate how the brain’s network functions by building artificial neural networks as computer simulations. In these, the brain’s neurons are mimicked by nodes that are given different values, and the synapses are represented by connections between the nodes that can be made stronger or weaker. Donald Hebb’s hypothesis is still used as one of the basic rules for updating artificial networks through a process called training.
At the end of the 1960s, some discouraging theoretical results caused many researchers to suspect that these neural networks would never be of any real use. However, interest in artificial neural networks was reawakened in the 1980s, when several important ideas made an impact, including work by this year’s laureates.
Associative memory
Imagine that you are trying to remember a fairly unusual word that you rarely use, such as one for that sloping floor often found in cinemas and lecture halls. You search your memory. It’s something like ramp… perhaps rad…ial? No, not that. Rake, that’s it!
This process of searching through similar words to find the right one is reminiscent of the associative memory that the physicist John Hopfield discovered in 1982. The Hopfield network can store patterns and has a method for recreating them. When the network is given an incomplete or slightly distorted pattern, the method can find the stored pattern that is most similar.
Hopfield had previously used his background in physics to explore theoretical problems in molecular biology. When he was invited to a meeting about neuroscience he encountered research into the structure of the brain. He was fascinated by what he learned and started to think about the dynamics of simple neural networks. When neurons act together, they can give rise to new and powerful characteristics that are not apparent to someone who only looks at the network’s separate components.
In 1980, Hopfield left his position at Princeton University, where his research interests had taken him outside the areas in which his colleagues in physics worked, and moved across the continent. He had accepted the offer of a professorship in chemistry and biology at Caltech (California Institute of Technology) in Pasadena, southern California. There, he had access to computer resources that he could use for free experimentation and to develop his ideas about neural networks.
However, he did not abandon his foundation in physics, where he found inspiration for his understanding of how systems with many small components that work together can give rise to new and interesting phenomena. He particularly benefitted from having learned about magnetic materials that have special characteristics thanks to their atomic spin – a property that makes each atom a tiny magnet. The spins of neighbouring atoms affect each other; this can allow domains to form with spin in the same direction. He was able to make a model network with nodes and connections by using the physics that describes how materials develop when spins influence each other.
The network saves images in a landscape
The network that Hopfield built has nodes that are all joined together via connections of different strengths. Each node can store an individual value – in Hopfield’s first work this could either be 0 or 1, like the pixels in a black and white picture.
Hopfield described the overall state of the network with a property that is equivalent to the energy in the spin system found in physics; the energy is calculated using a formula that uses all the values of the nodes and all the strengths of the connections between them. The Hopfield network is programmed by an image being fed to the nodes, which are given the value of black (0) or white (1). The network’s connections are then adjusted using the energy formula, so that the saved image gets low energy. When another pattern is fed into the network, there is a rule for going through the nodes one by one and checking whether the network has lower energy if the value of that node is changed. If it turns out that energy is reduced if a black pixel is white instead, it changes colour. This procedure continues until it is impossible to find any further improvements. When this point is reached, the network has often reproduced the original image on which it was trained.
This may not appear so remarkable if you only save one pattern. Perhaps you are wondering why you don’t just save the image itself and compare it to another image being tested, but Hopfield’s method is special because several pictures can be saved at the same time and the network can usually differentiate between them.
Hopfield likened searching the network for a saved state to rolling a ball through a landscape of peaks and valleys, with friction that slows its movement. If the ball is dropped in a particular location, it will roll into the nearest valley and stop there. If the network is given a pattern that is close to one of the saved patterns it will, in the same way, keep moving forward until it ends up at the bottom of a valley in the energy landscape, thus finding the closest pattern in its memory.
The Hopfield network can be used to recreate data that contains noise or which has been partially erased.
Hopfield and others have continued to develop the details of how the Hopfield network functions, including nodes that can store any value, not just zero or one. If you think about nodes as pixels in a picture, they can have different colours, not just black or white. Improved methods have made it possible to save more pictures and to differentiate between them even when they are quite similar. It is just as possible to identify or reconstruct any information at all, provided it is built from many data points.
Classification using nineteenth-century physics
Remembering an image is one thing, but interpreting what it depicts requires a little more.
Even very young children can point at different animals and confidently say whether it is a dog, a cat, or a squirrel. They might get it wrong occasionally, but fairly soon they are correct almost all the time. A child can learn this even without seeing any diagrams or explanations of concepts such as species or mammal. After encountering a few examples of each type of animal, the different categories fall into place in the child’s head. People learn to recognise a cat, or understand a word, or enter a room and notice that something has changed, by experiencing the environment around them.
When Hopfield published his article on associative memory, Geoffrey Hinton was working at Carnegie Mellon University in Pittsburgh, USA. He had previously studied experimental psychology and artificial intelligence in England and Scotland and was wondering whether machines could learn to process patterns in a similar way to humans, finding their own categories for sorting and interpreting information. Along with his colleague, Terrence Sejnowski, Hinton started from the Hopfield network and expanded it to build something new, using ideas from statistical physics.
Statistical physics describes systems that are composed of many similar elements, such as molecules in a gas. It is difficult, or impossible, to track all the separate molecules in the gas, but it is possible to consider them collectively to determine the gas’ overarching properties like pressure or temperature. There are many potential ways for gas molecules to spread through its volume at individual speeds and still result in the same collective
properties.
The states in which the individual components can jointly exist can be analysed using statistical physics, and the probability of them occurring calculated. Some states are more probable than others; this depends on the amount of available energy, which is described in an equation by the nineteenth-century physicist Ludwig Boltzmann. Hinton’s network utilised that equation, and the method was published in 1985 under the striking name of the Boltzmann machine.
Recognising new examples of the same type
The Boltzmann machine is commonly used with two different types of nodes. Information is fed to one group, which are called visible nodes. The other nodes form a hidden layer. The hidden nodes’ values and connections also contribute to the energy of the network as a whole.
The machine is run by applying a rule for updating the values of the nodes one at a time. Eventually the machine will enter a state in which the nodes’ pattern can change, but the properties of the network as a whole remain the same. Each possible pattern will then have a specific probability that is determined by the network’s energy according to Boltzmann’s equation. When the machine stops it has created a new pattern, which makes the Boltzmann machine an early example of a generative model.
The Boltzmann machine can learn – not from instructions, but from being given examples. It is trained by updating the values in the network’s connections so that the example patterns, which were fed to the visible nodes when it was trained, have the highest possible probability of occurring when the machine is run. If the same pattern is repeated several times during this training, the probability for this pattern is even higher. Training also affects the probability of outputting new patterns that resemble the examples on which the machine was trained.
A trained Boltzmann machine can recognise familiar traits in information it has not previously seen. Imagine meeting a friend’s sibling, and you can immediately see that they must be related. In a similar way, the Boltzmann machine can recognise an entirely new example if it belongs to a category found in the training material, and differentiate it from material that is dissimilar.
In its original form, the Boltzmann machine is fairly inefficient and takes a long time to find solutions. Things become more interesting when it is developed in various ways, which Hinton has continued to explore. Later versions have been thinned out, as the connections between some of the units have been removed. It turns out that this may make the machine more efficient.
During the 1990s, many researchers lost interest in artificial neural networks, but Hinton was one of those who continued to work in the field. He also helped start the new explosion of exciting results; in 2006 he and his colleagues Simon Osindero, Yee Whye Teh and Ruslan Salakhutdinov developed a method for pretraining a network with a series of Boltzmann machines in layers, one on top of the other. This pretraining gave the connections in the network a better starting point, which optimised its training to recognise elements in pictures.
The Boltzmann machine is often used as part of a larger network. For example, it can be used to recommend films or television series based on the viewer’s preferences.
Machine learning – today and tomorrow
Thanks to their work from the 1980s and onward, John Hopfield and Geoffrey Hinton have helped lay the foundation for the machine learning revolution that started around 2010.
The development we are now witnessing has been made possible through access to the vast amounts of data that can be used to train networks, and through the enormous increase in computing power. Today’s artificial neural networks are often enormous and constructed from many layers. These are called deep neural networks and the way they are trained is called deep learning.
A quick glance at Hopfield’s article on associative memory, from 1982, provides some perspective on this development. In it, he used a network with 30 nodes. If all the nodes are connected to each other, there are 435 connections. The nodes have their values, the connections have different strengths and, in total, there are fewer than 500 parameters to keep track of. He also tried a network with 100 nodes, but this was too complicated, given the computer he was using at the time. We can compare this to the large language models of today, which are built as networks that can contain more than one trillion parameters (one million millions).
Many researchers are now developing machine learning’s areas of application. Which will be the most viable remains to be seen, while there is also wide-ranging discussion on the ethical issues that surround the development and use of this technology.
Because physics has contributed tools for the development of machine learning, it is interesting to see how physics, as a research field, is also benefitting from artificial neural networks. Machine learning has long been used in areas we may be familiar with from previous Nobel Prizes in Physics. These include the use of machine learning to sift through and process the vast amounts of data necessary to discover the Higgs particle. Other applications include reducing noise in measurements of the gravitational waves from colliding black holes, or the search for exoplanets.
In recent years, this technology has also begun to be used when calculating and predicting the properties of molecules and materials – such as calculating protein molecules’ structure, which determines their function, or working out which new versions of a material may have the best properties for use in more efficient solar cells.
Further reading
Additional information on this year’s prizes, including a scientific background in English, is available on the website of the Royal Swedish Academy of Sciences, www.kva.se, and at www.nobelprize.org, where you can watch video from the press conferences, the Nobel Lectures and more. Information on exhibitions and activities related to the Nobel Prizes and the Prize in Economic Sciences is available at www.nobelprizemuseum.se.
The Royal Swedish Academy of Sciences has decided to award the Nobel Prize in Physics 2024 to
JOHN J. HOPFIELD
Born 1933 in Chicago, IL, USA. PhD 1958 from Cornell University, Ithaca, NY, USA. Professor at Princeton University, NJ, USA.
GEOFFREY E. HINTON
Born 1947 in London, UK. PhD 1978 from The University of Edinburgh, UK. Professor at University of Toronto, Canada.
“for foundational discoveries and inventions that enable machine learning with artificial neural networks”
Science Editors: Ulf Danielsson, Olle Eriksson, Anders Irbäck, and Ellen Moons, the Nobel Committee for Physics
Text: Anna Davour
Translator: Clare Barnes
Illustrations: Johan Jarnestad
Editor: Sara Gustavsson
© The Royal Swedish Academy of Sciences