论文标题
无机晶体结构原型数据库基于本地原子环境的无监督学习
Inorganic Crystal Structure Prototype Database based on Unsupervised Learning of Local Atomic Environments
论文作者
论文摘要
从巨大已知的无机晶体结构中识别结构原型,这是对材料科学研究和新材料设计的重要主题。无机晶体结构原型的现有数据库主要是通过根据晶体学空间组信息对材料进行分类来构建的。在本文中,我们采用了一种独特的策略来构建无机晶体结构原型数据库,依靠材料的分类在局部原子环境(LAE)方面,并伴随着无监督的机器学习方法。具体而言,我们在所有实验已知的无机晶体结构数据上采用了分层聚类方法,以识别结构原型。分层聚类的标准是LAE,由改进的键取向阶参数参数的最新结构指纹和原子位置的平滑重叠。这使我们能够建立一个基于LAE的无机晶体结构原型数据库(LAE-ISSPD),该数据库包含15,613个结构原型,并具有定义的stoichiomerties。此外,我们开发了一个结构原型生成器基础结构(SPGI)软件包,该软件包是结构原型生成的有用工具包。我们开发的SPGI工具包和LAE-ICSPD对以全球方式研究无机材料以及以数据驱动模式加速材料发现过程有益。
Recognition of structure prototypes from tremendous known inorganic crystal structures has been an important subject beneficial for material science research and new materials design. The existing databases of inorganic crystal structure prototypes were mostly constructed by classifying materials in terms of the crystallographic space group information. Herein, we employed a distinct strategy to construct the inorganic crystal structure prototype database, relying on the classification of materials in terms of local atomic environments (LAE) accompanied by unsupervised machine learning method. Specifically, we adopted a hierarchical clustering approach onto all experimentally known inorganic crystal structures data to identify structure prototypes. The criterion for hierarchical clustering is the LAE represented by the state-of-the-art structure fingerprints of the improved bond-orientational order parameters and the smooth overlap of atomic positions. This allows us to build up a LAE-based Inorganic Crystal Structure Prototype Database (LAE-ICSPD) containing 15,613 structure prototypes with defined stoichiometries. In addition, we have developed a Structure Prototype Generator Infrastructure (SPGI) package, which is a useful toolkit for structure prototype generation. Our developed SPGI toolkit and LAE-ICSPD are beneficial for investigating inorganic materials in a global way as well as accelerating materials discovery process in the data-driven mode.