面向晶体与分子材料的高性能通用图神经网络

学术 2025-01-23 14:37 中国台湾

图神经网络（GNNs）在材料性能预测，机器学习势构建等方面表现优异。GNNs专为处理图结构数据而设计，与几何深度学习紧密相关。在化学领域，GNNs被视为卷积神经网络的扩展，能够直接处理由原子和键构成的化学图，甚至是三维原子结构或点云，从而充分表征原子级材料特征，并在更大尺度上整合物理定律和现象（如掺杂和无序）。

Figure 1, Architecture and components of the DenseGNN.

自2018年以来，多种GNN模型（如CGCNN、SchNet、MEGNet、iCGCNN、ALIGNN和coGN等）相继提出，通过结合不同的图表示策略、卷积操作组件和多体相互作用模块的设计，不断提升预测性能。

Figure 2, Representation of local chemical environment in crystal structure.

然而，GNNs在化学和材料领域的应用仍面临诸多挑战：首先，嵌套图网络（如ALIGNN和coNGN）因复杂的图表示策略，导致训练成本高昂，且其优势仅体现在部分包含对称性的晶体数据集上；其次，材料和化学领域的研究与当前模型开发存在不平衡，将现有GNN模型扩展至更广泛的应用领域（如分子、晶体材料和催化）面临挑战；最后，大多数消息传递GNNs普遍存在过平滑问题，限制了图卷积层数的增加，进而影响模型性能。

Figure 3, Comparison of test MAE results on MatBench datasets.

针对上述问题，上海交通大学汪洪和惠健研究团队的杜红伟等，提出了DenseGNN模型，该模型结合了密集连接网络（DCN）、层次节点-边-图残差网络（HRN）和局部结构序参数嵌入（LOPE）策略，有效克服了过平滑问题，支持构建60层的深层GNNs，没有出现性能退化现象。

Figure 4, Comparison of test MAE results on JARVIS-DFT datasets.

DenseGNN通过DCN和残差连接策略HRN，在消息传递过程中同时更新边、节点和图级特征，实现了更直接和密集的信息传播，减少了信息丢失，提升了网络性能和泛化能力。实验表明，DenseGNN在多个材料和分子领域的基准数据集上超越了最新的coGN、ALIGNN和M3GNet，并在小数据集上展现出更高的学习效率。

Figure 5, Test MAE changes after fusing DCN and LOPE strategies.

此外，通过引入LOPE和优化原子嵌入，实现最优的边连接，显著缩短了大型GNNs的训练和推理时间，同时保持了预测准确性。特别地，DenseGNN在区分晶体结构方面的能力接近标准X射线衍射（XRD）方法，为材料发现提供了强有力的工具。

Figure 6, Comparison of edge connections among models on Matbench datasets.

本研究的主要贡献包括：1）克服了GNNs在预测材料性能方面的主要瓶颈，提出了一种基于DCN的新型GNN架构；2）将DCN和LOPE策略应用于计算机、晶体材料和分子领域的GNNs，在多个材料和分子数据集上实现了显著的性能提升；3）通过优化原子嵌入和最优边连接，显著提高了模型训练和推理效率，为大规模材料筛选提供了可行方案，详细内容见下面图示。相关论文近期发布于npj Computational Materials10: 292 (2024)。手机阅读原文，请点击本文底部左下角“阅读原文”，进入后亦可下载全文PDF文件。

Figure 7, Extrapolation performance test comparison between DenseGNN and reference models.

Editorial Summary

Seeing the forest for the trees: Unifying microscopic structures and thermal conductivity under a data-driven paradigm

This paper introduces DenseGNN, an innovative graph neural network model that addresses key challenges faced by GNNs in predicting material and molecular properties. These challenges include high training and computational costs due to current graph construction methods, performance degradation when models are applied from fields like computer and molecular science to materials science, and the over-smoothing problem that limits the depth of GNN models. Hongwei Du, Jian Hui, and colleagues from the research team led by Hong Wang at Shanghai Jiao Tong University developed DenseGNN by incorporating Dense Connection Network (DCN), Hierarchical Node-Edge-Graph Residual Network (HRN), and Local Order Parameter Embedding (LOPE) strategies, effectively addressing these problems. DenseGNN significantly reduces the computational cost of training large GNNs by optimizing atomic representations and edge connections while maintaining accuracy. Additionally, the universal components of DCN and LOPE not only enhance DenseGNN's performance in material property prediction but also significantly improve the application of GNN models from other fields to materials science. Most importantly, DenseGNN solves the over-smoothing problem, enabling the construction of deep GNNs with over 60 layers, which is difficult for traditional GNNs. Test results on multiple datasets, such as JARVIS-DFT, Materials Project, and QM9, demonstrate significant progress in material property prediction, showcasing its broad applicability and scalability. This achievement not only advances materials science research but also provides a powerful tool for the discovery and design of new materials.

This paper has been published recently in npj Computational Materials. 10: 292 (2024). https://doi.org/10.1038/s41524-024-01444-x.

原文Abstract及其翻译

DenseGNN: universal and scalable deeper graph neural networks for high-performance property prediction in crystals and molecules (DenseGNN：高效准确预测晶体与分子性质的通用深度图神经网络)

Hongwei Du, Jiamin Wang, Jian Hui, Lanting Zhang & Hong Wang

Abstract Modern generative models based on deep learning have made it possible to design millions of hypothetical materials. To screen these candidate materials and identify promising new materials, we need fast and accurate models to predict material properties. Graphical neural networks (GNNs) have become a current research focus due to their ability to directly act on the graphical representation of molecules and materials, enabling comprehensive capture of important information and showing excellent performance in predicting material properties. Nevertheless, GNNs still face several key problems in practical applications: First, although existing nested graph network strategies increase critical structural information such as bond angles, they significantly increase the number of trainable parameters in the model, resulting in a increase in training costs; Second, extending GNN models to broader domains such as molecules, crystalline materials, and catalysis, as well as adapting to small data sets, remains a challenge. Finally, the scalability of GNN models is limited by the over-smoothing problem. To address these issues, we propose the DenseGNN model, which combines Dense Connectivity Network (DCN), hierarchical node-edge-graph residual networks (HRN), and Local Structure Order Parameters Embedding (LOPE) strategies to create a universal, scalable, and efficient GNN model. We have achieved state-of-the-art performance (SOAT) on several datasets, including JARVIS-DFT, Materials Project, QM9, Lipop, FreeSolv, ESOL, and OC22, demonstrating the generality and scalability of our approach. By merging DCN and LOPE strategies into GNN models in computing, crystal materials, and molecules, we have improved the performance of models such as GIN, Schnet, and Hamnet on materials datasets such as Matbench. The LOPE strategy optimizes the embedding representation of atoms and allows our model to train efficiently with a minimal level of edge connections. This substantially reduces computational costs and shortens the time required to train large GNNs while maintaining accuracy. Our technique not only supports building deeper GNNs and avoids performance penalties experienced by other models, but is also applicable to a variety of applications that require large deep learning models. Furthermore, our study demonstrates that by using structural embeddings from pre-trained models, our model not only outperforms other GNNs in distinguishing crystal structures but also approaches the standard X-ray diffraction (XRD) method.

摘要现代基于深度学习的生成模型使得设计数百万种假想材料成为可能。为了筛选这些候选材料并识别有前途的新材料，我们需要快速且准确的模型来预测材料属性。图神经网络（GNNs）因其能够直接作用于分子和晶体结构的图形表示，从而全面捕获重要信息并在预测材料属性方面表现出色，已成为当前的研究焦点。尽管如此，GNNs在实际应用中仍面临几个关键问题：首先，尽管现有的嵌套图网络策略增加了如键角、二面角等关键结构信息，但它们显著增加了图表示的复杂度，导致训练和推理成本急剧上升；其次，将GNN模型扩展到更广泛的领域，如分子、晶体材料和催化等领域，以及适应小数据集，仍然是一个挑战；最后，GNN模型的深度可扩展性受到过平滑问题的限制。为了解决这些问题，我们提出了DenseGNN模型，它结合了密集连接网络（DCN）、层次节点-边-图残差网络（HRN）和局部结构序参数嵌入（LOPE）策略，创建了一个通用、可扩展且高效，准确的GNN模型。我们在多个基准数据集上实现了最先进的性能，包括JARVIS-DFT、Materials Project、QM9、Lipop、FreeSolv、ESOL和OC22，展示了我们方法的通用性和可扩展性。通过将DCN和LOPE策略融合到跨领域的GNN模型中，显著提升了计算机和分子等其他领域的经典模型（如GIN、SchNet和HamNet）在材料数据集Matbench上的性能。LOPE策略优化了原子的嵌入表示，使我们的模型无需通过高成本的嵌套图策略进行结构图的多体相互作用的表示，而是以最少的边连接实现了多体相互作用结构图的准确表示，从而实现了高效训练，减少了计算成本，缩短了训练和推理大型GNN所需的时间，同时保持了准确性。我们的技术不仅能够支持构建更深层的GNN，有效避免了其他模型在加深网络时出现的性能下降问题，还广泛适用于各类需要大规模深度学习模型的应用场景。此外，我们的研究表明，基于能量预训练模型的结构嵌入，使我们的模型不仅在区分晶体结构方面优于其他GNN，其准确性还接近标准的X射线衍射（XRD）方法。

微信分享

学术之友

\x26quot;学术之友\x26quot;旨在建立一个综合的学术交流平台。主要内容包括：分享科研资讯，总结学术干货，发布科研招聘等。让我们携起手来共同学习，一起进步！

最新文章

PRL：铪基铁电材料降低矫顽场的新策略

4代Intel高主频超算低至4分

隐匿的星光：张益唐的七千万征途

VESTA 3.90.5a版本更新

CBA程序：cif文件化学键长高通量分析程序

【追N求S】Nature | 竞争铁电序中的超高压电响应

《哪吒2》导演饺子：不能给自己留后路

首发！硅基流动 x 华为云联合推出基于昇腾云的 DeepSeek R1 & V3 推理服务！

重组完成！318个全国重点实验室名单出炉！

Nat. Commun.：机器学习Transformer生成原子嵌入提高晶体特性的预测精度

2025年度国自然信息科学部资助要点提炼

刚刚OpenAI正式推出o3 mini：免费用户也可以用「感谢DeepSeek」

AI 科研重大升级！OpenAI宣布携手美国国家实验室，为15000名科学家送上科研利器

爆料！英伟达宣布使用DeepSeek

ACS Catalysis：通过Ir/Ru共取代合理设计ꞵ-MnO2在酸性介质中增强OER

2025年度国自然工程与材料科学部资助要点提炼！

清华大学Nature: 新的策略提升反铁电储能

基金申请必看：五篇代表作的精准选择技巧

科技部内设机构重大调整

宇树机器人扭秧歌背后的技术密码

npj Comput. Mater.: 等变神经网络加速Hubbard参数预测

武大付磊团队发表Chem. Soc. Rev.：高熵合金的可控合成

Deepseek创始人梁文锋除夕夜回应“国运论”

震撼！deepseek深夜发布，超越OpenAI！Janus-Pro技术报告全面解读！

人工智能能否改善当前分子模拟的现状？

王金兰等人NC: 通过生成模型和鸟群算法逆向设计高效CO2还原电催化剂

DARWIN 1.5 来啦！材料设计通用大语言模型，刷新多项实验性质预测记录

北京大学Chemical Reviews：原子尺度的界面催化

深度学习势能模拟铁电拓扑结构

Nature报道 | 国产之光！中国的低成本开源AI模型DeepSeek令科学家兴奋！

2025最新NC，分子筛机器学习MD

PRL: 魏苏淮教授与合作团队揭示从金属到本征透明导电体的设计新思路

PACKMOL-GUI：用于高效分子包装的一体化VMD界面

2024年度半导体科学与信息器件学科项目受理与资助情况

Best of Machine Learning with Python

JACS：利用大语言模型收集和分析MOFs性能数据集

DP还能干这个？数据降维方法助力DeePMD力场特征数据集搭建

Nat. Rev. Mater.：钙钛矿光伏——机载航天大有可为！

面向晶体与分子材料的高性能通用图神经网络

中科院北大等揭示「蒸馏真相」：除Claude豆包Gemini，其他很多模型都「蒸」过头

npj Comput. Mater.：通过混合Transformer图神经网络加速材料性能预测

突发！OpenAI宣布“星际之门计划”：5000 亿美元构建未来 AI 基础设施

JPhysD编委访谈|北京科技大学王荣明教授

微软研究团队通过AI加速材料发现，开创材料设计新纪元

国家自然科学基金项目申请书填报指导手册上新！

六年审稿终接收，唐军旺等人最新Nature！

首届复杂界面谱学智算研讨会在厦门大学成功举办

Nat. Catal.: 新型电化学技术将尿液转化为高价值固体过氧化尿素

分类

时事

民生

政务

教育

文化

科技

财富

体娱

健康

情感

旅行

百科

职场

楼市

企业

乐活

学术

汽车

时尚

创业

美食

幽默

美体

文摘

原创标签

时事社会财经军事教育体育科技汽车科学房产搞笑综艺明星音乐动漫游戏时尚健康旅游美食生活摄影宠物职场育儿情感小说曲艺文化历史三农文学娱乐电影视频图片新闻宗教电视剧纪录片广告创意壁纸头像心灵鸡汤星座命理教育培训艺术文化金融财经健康医疗美妆时尚餐饮美食母婴育儿社会新闻工业农业时事政治星座占卜幽默笑话独立短篇连载作品文化历史科技互联网

发布位置

广东北京山东江苏河南浙江山西福建河北上海四川陕西湖南安徽湖北内蒙古江西云南广西甘肃辽宁黑龙江贵州新疆重庆吉林天津海南青海宁夏西藏香港澳门台湾美国加拿大澳大利亚日本新加坡英国西班牙新西兰韩国泰国法国德国意大利缅甸菲律宾马来西亚越南荷兰柬埔寨俄罗斯巴西智利卢森堡芬兰瑞典比利时瑞士土耳其斐济挪威朝鲜尼日利亚阿根廷匈牙利爱尔兰印度老挝葡萄牙乌克兰印度尼西亚哈萨克斯坦塔吉克斯坦希腊南非蒙古奥地利肯尼亚加纳丹麦津巴布韦埃及坦桑尼亚捷克阿联酋安哥拉