海归学者发起的公益学术平台
分享信息,整合资源
交流学术,偶尔风月
在聚合物研究中,正向筛选和逆向设计是推动聚合物从实验室研究走向市场应用的关键步骤。然而,聚合物材料的研发面临着一个重大挑战——大规模聚合物数据集的缺乏。正因如此,利用材料信息学通过小型数据集设计符合特定需求的聚合物成为科学家们的研究焦点。传统的聚合物筛选方法虽然在某些方面取得了进展,但依然无法有效解决如何在有限的候选库中找到满足要求的聚合物这一难题。通过人类的想象力列举所有可能的聚合物结构显然是不现实的,这就提出了如何进行“按需逆向设计”的问题,成为了当前聚合物领域的一个重要方向。
针对这一挑战,长春应用化学研究所孙昭艳研究员团队提出了一个创新性的聚合物生成模型——PolyTAO。该模型通过一个包含近百万条聚合物结构-性质对的大型数据库,结合Transformer辅助的定向预训练方法,使得聚合物的按需逆向设计成为可能。PolyTAO在top-1生成模式下,达到了99.27%的化学有效性,生成了约20万个聚合物,成功率在所有已报道的聚合物生成模型中名列前茅。PolyTAO的成功不仅体现在其高效生成大量化学有效的聚合物,还在于其在多个聚合物性质上的优异表现。研究表明,PolyTAO生成的聚合物在15个预定义性质上的预测精度非常高,R²值平均为0.96,这意味着模型充分掌握了聚合物的结构-性质关系,并能精准预测其性能。
图1. 在15个预定义性质上的生成表现
图2. 模型几乎适用所有聚合物中常见的化学元素(并可根据后续任务补充缺失指定化学元素相关的数据)
为了验证PolyTAO模型在实际应用中的广泛适应性,研究团队还在多个小型聚合物数据集上进行了微调实验,采用了半模板和无模板生成范式。这些实验结果表明,PolyTAO不仅能在常规生成模式下工作,还能在无模板生成和更具挑战性的任务中成功生成具有目标性质的聚合物,展示了其强大的灵活性和广泛的应用前景。
图3. PolyTAO利用半模板方式生成具备指定原子化能的聚合物
图4. PolyTAO通过无模板方式生成具备指定带隙的聚合物
PolyTAO的提出为聚合物的按需逆向设计提供了新的方向,突破了传统方法在小型数据集和候选库多样性方面的限制。它不仅推动了聚合物生成模型的理论进步,也为材料科学领域的逆向设计方法提供了重要启示。
随着模型在不同领域的推广应用,未来PolyTAO有望为更多材料的设计与发现提供重要的工具,帮助加速材料的开发与应用。该文近期发表于npj Computational Materials 10: 273 (2024),英文标题与摘要如下,点击左下角“阅读原文”可以自由获取论文PDF。
On-demand reverse design of polymers with PolyTAO
Haoke Qiu & Zhao-Yan Sun
The forward screening and reverse design of drug molecules, inorganic molecules, and polymers with enhanced properties are vital for accelerating the transition from laboratory research to market application. Specifically, due to the scarcity of large-scale datasets, the discovery of polymers via materials informatics is particularly challenging. Nonetheless, scientists have developed various machine learning models for polymer structure-property relationships using only small polymer datasets, thereby advancing the forward screening process of polymers. However, the success of this approach ultimately depends on the diversity of the candidate pool, and exhaustively enumerating all possible polymer structures through human imagination is impractical. Consequently, achieving on-demand reverse design of polymers is essential. In this work, we curate an immense polymer dataset containing nearly one million polymeric structure-property pairs based on expert knowledge. Leveraging this dataset, we propose a Transformer-Assisted Oriented pretrained model for on-demand polymer generation (PolyTAO). This model generates polymers with 99.27% chemical validity in top-1 generation mode (approximately 200k generated polymers), representing the highest reported success rate among polymer generative models, and this was achieved on the largest test set. Importantly, the average R2 between the properties of the generated polymers and their expected values across 15 predefined properties is 0.96, which underscores PolyTAO’s powerful on-demand polymer generation capabilities. To further evaluate the pretrained model’s performance in generating polymers with additional user-defined properties for downstream tasks, we conduct fine-tuning experiments on three publicly available small polymer datasets using both semi-template and template-free generation paradigms. Through these extensive experiments, we demonstrate that our pretrained model and its fine-tuned versions are capable of achieving the on-demand reverse design of polymers with specified properties, whether in a semi-template generation or the more challenging template-free generation scenarios, showcasing its potential as a unified pretrained foundation model for polymer generation.
扩展阅读