论文标题
使用机器学习算法在伊朗北部预测和映射土壤有机碳
Predicting and Mapping of Soil Organic Carbon Using Machine Learning Algorithms in Northern Iran
论文作者
论文摘要
对土壤有机碳含量的估计对于理解土壤的化学,物理和生物学功能至关重要。这项研究建议机器学习载体机器,人工神经网络,回归树,随机森林,极端梯度的增强和传统的深神经网络,用于推进SOC的预测模型。使用1879个复合表面土壤样品和105个辅助数据作为预测因素训练模型。遗传算法用作识别有效变量的特征选择方法。结果表明,降水是驱动SOC空间变异性的15%的最重要预测因子,其次是归一化差异指数,中等分辨率成像谱仪的日温度指数,分别分别是多解决山谷底部平坦度和土地使用。基于10倍的交叉验证,DNN模型报告为最低预测误差和不确定性的上级算法。在准确性方面,DNN的平均绝对误差为59%,均方根误差为75%,确定系数为0.65,而LINS一致性相关系数为0.83。 SOC含量在UDIC土壤水分状态类别中最高,平均值为4%,其次是Aquic和Xeric类。茂密林地中的土壤具有最高的SOC内容,而年轻的地质年龄和冲积粉丝的土壤具有较低的SOC。所提出的DNN是一种有希望的算法,用于在省级处理大量辅助数据,并且由于其灵活的结构以及能够从围绕采样观测的辅助数据中提取更多信息的能力,因此它具有很高的准确性,可以预测SOC基线图的预测和最小的不确定。
Estimation of the soil organic carbon content is of utmost importance in understanding the chemical, physical, and biological functions of the soil. This study proposes machine learning algorithms of support vector machines, artificial neural networks, regression tree, random forest, extreme gradient boosting, and conventional deep neural network for advancing prediction models of SOC. Models are trained with 1879 composite surface soil samples, and 105 auxiliary data as predictors. The genetic algorithm is used as a feature selection approach to identify effective variables. The results indicate that precipitation is the most important predictor driving 15 percent of SOC spatial variability followed by the normalized difference vegetation index, day temperature index of moderate resolution imaging spectroradiometer, multiresolution valley bottom flatness and land use, respectively. Based on 10 fold cross validation, the DNN model reported as a superior algorithm with the lowest prediction error and uncertainty. In terms of accuracy, DNN yielded a mean absolute error of 59 percent, a root mean squared error of 75 percent, a coefficient of determination of 0.65, and Lins concordance correlation coefficient of 0.83. The SOC content was the highest in udic soil moisture regime class with mean values of 4 percent, followed by the aquic and xeric classes, respectively. Soils in dense forestlands had the highest SOC contents, whereas soils of younger geological age and alluvial fans had lower SOC. The proposed DNN is a promising algorithm for handling large numbers of auxiliary data at a province scale, and due to its flexible structure and the ability to extract more information from the auxiliary data surrounding the sampled observations, it had high accuracy for the prediction of the SOC baseline map and minimal uncertainty.