域引导的任务分解和自训练以检测社交媒体中的个人事件

论文标题

域引导的任务分解和自训练以检测社交媒体中的个人事件

Domain-Guided Task Decomposition with Self-Training for Detecting Personal Events in Social Media

论文作者

Karisani, Payam, Ho, Joyce C., Agichtein, Eugene

论文摘要

挖掘社交媒体内容，例如检测个人经验或事件，遇到词汇稀疏，培训数据不足和创造性词典。为了减少创建广泛标记的数据并提高分类性能的负担，我们建议分两个步骤执行这些任务：1。通过识别关键概念将任务分解为特定领域的子任务，从而利用人类领域的理解； 2。使用共同培训将学习者的结果组合为每个关键概念的结果，以减少标记培训数据的要求。我们从经验上展示了我们方法的有效性和普遍性，共同完成，使用三个代表性的社交媒体挖掘任务，即个人健康提及检测，危机报告检测和不良药物反应监测。实验表明，我们的模型能够胜过最先进的文本分类模型（包括使用最近引入的BERT模型的模型），当时有少量的培训数据。

Mining social media content for tasks such as detecting personal experiences or events, suffer from lexical sparsity, insufficient training data, and inventive lexicons. To reduce the burden of creating extensive labeled data and improve classification performance, we propose to perform these tasks in two steps: 1. Decomposing the task into domain-specific sub-tasks by identifying key concepts, thus utilizing human domain understanding; and 2. Combining the results of learners for each key concept using co-training to reduce the requirements for labeled training data. We empirically show the effectiveness and generality of our approach, Co-Decomp, using three representative social media mining tasks, namely Personal Health Mention detection, Crisis Report detection, and Adverse Drug Reaction monitoring. The experiments show that our model is able to outperform the state-of-the-art text classification models--including those using the recently introduced BERT model--when small amounts of training data are available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题