论文标题
统计和机器学习算法的比较,以预测旧金山湾地区的租金
A Comparison of Statistical and Machine Learning Algorithms for Predicting Rents in the San Francisco Bay Area
论文作者
论文摘要
城市运输和土地使用模型已使用理论和统计建模方法来开发对计划应用程序有用的模型系统。机器学习方法已被认为是“黑匣子”,缺乏可解释性,并且在土地使用和运输建模文献中的使用受到限制。我们提出了一种用例,其中预测准确性至关重要,并将随机森林回归与使用普通最小二乘的多重回归进行比较,以预测旧金山湾区每平方英尺的租金,使用从Craigslist网站上刮掉的大量租金清单。我们发现,尽管随机森林模型的预测准确性大大较高,但我们能够使用几乎完全可访问性变量从两个模型中获得有用的预测。
Urban transportation and land use models have used theory and statistical modeling methods to develop model systems that are useful in planning applications. Machine learning methods have been considered too 'black box', lacking interpretability, and their use has been limited within the land use and transportation modeling literature. We present a use case in which predictive accuracy is of primary importance, and compare the use of random forest regression to multiple regression using ordinary least squares, to predict rents per square foot in the San Francisco Bay Area using a large volume of rental listings scraped from the Craigslist website. We find that we are able to obtain useful predictions from both models using almost exclusively local accessibility variables, though the predictive accuracy of the random forest model is substantially higher.