使用未标记的观测值改善机器学习衍生的光度红移和物理属性估算

论文标题

使用未标记的观测值改善机器学习衍生的光度红移和物理属性估算

Improving machine learning-derived photometric redshifts and physical property estimates using unlabelled observations

论文作者

Humphrey, A., Cunha, P. A. C., Paulino-Afonso, A., Amarantidis, S., Carvajal, R., Gomes, J. M., Matute, I., Papaderos, P.

论文摘要

在大量天文学调查的时代，机器学习为有效的星系特性提供了有希望的解决方案。用于应用机器学习的传统，“监督”范式涉及在标记数据上培训模型，并使用此模型来预测先前未标记的数据的标签。半监督的“伪标签”技术提供了另一种范式，从而允许模型训练算法从标记的数据和尚未标记的数据中学习。我们使用COSMOS2015宽带光度法和几种公开可用的机器学习算法之一，测试了伪标签方法，以估计红移，恒星质量和恒星形成速率的问题，与纯监督的学习相比，我们获得了重大改进。我们发现，促进梯度的树方法Catboost，XGBoost和LightGBM受益最大，而绝对误差的指标减少了〜15％。我们还发现，光度变速灾难性异常比例的相似改善。我们认为，伪标记技术将有助于在即将进行的大型成像调查（例如Euclid and LSST）中估计星系的红移和物理性质，这将为数十亿个来源提供光度数据。

In the era of huge astronomical surveys, machine learning offers promising solutions for the efficient estimation of galaxy properties. The traditional, `supervised' paradigm for the application of machine learning involves training a model on labelled data, and using this model to predict the labels of previously unlabelled data. The semi-supervised `pseudo-labelling' technique offers an alternative paradigm, allowing the model training algorithm to learn from both labelled data and as-yet unlabelled data. We test the pseudo-labelling method on the problems of estimating redshift, stellar mass, and star formation rate, using COSMOS2015 broad band photometry and one of several publicly available machine learning algorithms, and we obtain significant improvements compared to purely supervised learning. We find that the gradient-boosting tree methods CatBoost, XGBoost, and LightGBM benefit the most, with reductions of up to ~15% in metrics of absolute error. We also find similar improvements in the photometric redshift catastrophic outlier fraction. We argue that the pseudo-labellng technique will be useful for the estimation of redshift and physical properties of galaxies in upcoming large imaging surveys such as Euclid and LSST, which will provide photometric data for billions of sources.

下载PDF全文

下载文献需遵守相关版权规定

论文标题