使用Bert预测质量检查网站问题的主观特征

论文标题

使用Bert预测质量检查网站问题的主观特征

Predicting Subjective Features of Questions of QA Websites using BERT

论文作者

Annamoradnejad, Issa, Fazli, Mohammadamin, Habibi, Jafar

论文摘要

社区提问网站（例如Stackoverflow和Quora）希望用户遵循特定的准则，以维持内容质量。这些系统主要依靠社区报告来评估内容，这存在严重的问题，例如违规行为缓慢，正常和经验丰富的用户的时间损失，某些报告的低质量以及对新用户的反馈。因此，鉴于为Q＆A网站中自动化节制操作的解决方案提供了解决方案，我们旨在提供一个模型来预测QA网站中问题的20个质量或主观方面。为此，我们使用了2019年Google Research在Google Research收集的数据，并针对我们的问题进行了微调的预培训模型。基于均方纠纷（MSE）的评估，该模型在2个时期的训练后达到了0.046的值，在接下来的训练中并没有大大改善。结果证实，通过简单的微调，我们可以在很少的时间和更少的数据量中实现准确的模型。

Community Question-Answering websites, such as StackOverflow and Quora, expect users to follow specific guidelines in order to maintain content quality. These systems mainly rely on community reports for assessing contents, which has serious problems such as the slow handling of violations, the loss of normal and experienced users' time, the low quality of some reports, and discouraging feedback to new users. Therefore, with the overall goal of providing solutions for automating moderation actions in Q&A websites, we aim to provide a model to predict 20 quality or subjective aspects of questions in QA websites. To this end, we used data gathered by the CrowdSource team at Google Research in 2019 and a fine-tuned pre-trained BERT model on our problem. Based on the evaluation by Mean-Squared-Error (MSE), the model achieved a value of 0.046 after 2 epochs of training, which did not improve substantially in the next ones. Results confirm that by simple fine-tuning, we can achieve accurate models in little time and on less amount of data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题