论文标题
metarf:有几条小径的反应产量预测的可区分随机森林
MetaRF: Differentiable Random Forest for Reaction Yield Prediction with a Few Trails
论文作者
论文摘要
人工智能通过许多令人印象深刻的应用深刻地彻底改变了药物化学领域,但是这些应用的成功需要大量具有高质量注释的培训样本,这严重限制了数据驱动方法的广泛使用。在本文中,我们专注于反应产量预测问题,该问题仅通过一些实验试验,有助于化学家在新的化学空间中选择高收益反应。为了攻击这一挑战,我们首先提出了Metarf,这是一种基于注意力的随机森林模型,该模型是专门针对少量产量预测设计的,其中随机森林的注意力重量通过元学习框架自动优化,可以快速适应以预测新试剂的性能,同时给出一些其他样品。为了提高少量学习绩效,我们进一步引入了一种基于尺寸的采样方法,以确定要进行实验测试然后学习的有价值的样品。我们的方法在三个不同的数据集上进行了评估,并在少数拍摄的预测上获得了令人满意的性能。在高通量实验(HTE)数据集中,我们方法论的前10个高收益反应的平均产率相对接近理想的产量选择结果。
Artificial intelligence has deeply revolutionized the field of medicinal chemistry with many impressive applications, but the success of these applications requires a massive amount of training samples with high-quality annotations, which seriously limits the wide usage of data-driven methods. In this paper, we focus on the reaction yield prediction problem, which assists chemists in selecting high-yield reactions in a new chemical space only with a few experimental trials. To attack this challenge, we first put forth MetaRF, an attention-based differentiable random forest model specially designed for the few-shot yield prediction, where the attention weight of a random forest is automatically optimized by the meta-learning framework and can be quickly adapted to predict the performance of new reagents while given a few additional samples. To improve the few-shot learning performance, we further introduce a dimension-reduction based sampling method to determine valuable samples to be experimentally tested and then learned. Our methodology is evaluated on three different datasets and acquires satisfactory performance on few-shot prediction. In high-throughput experimentation (HTE) datasets, the average yield of our methodology's top 10 high-yield reactions is relatively close to the results of ideal yield selection.