论文标题

我应该去这个地方吗?评论中的包含和排除短语挖掘

Should I visit this place? Inclusion and Exclusion Phrase Mining from Reviews

论文作者

Gurjar, Omkar, Gupta, Manish

论文摘要

尽管几种自动行程生成服务使旅行计划变得容易,但通常有时旅行者发现自己处在他们无法从旅行中获得最佳状态的独特情况。访客在许多因素方面有所不同,例如遭受残疾,特殊的饮食偏爱,与幼儿旅行等。虽然大多数旅游景点是普遍的,但其他游客可能并不包含所有人。在本文中,我们关注与旅游景点相关的评论,与11个此类因素相关的采矿包含和排除短语的问题。虽然在旅游数据挖掘的现有工作主要集中于与旅行相关信息,个性化的情感分析和自动行程生成的结构化提取,但据我们所知,这是旅游评论中首次有关包含/排除短语挖掘的工作。使用与1000个旅游景点相关的2000评论的数据集,我们的广义分类器提供了$ \ sim $ 80和$ \ sim $ 82的二进制重叠F1,分别将短语分别为包含或排除。此外,我们的包含/排除分类器分别提供11级包含和排除分类的F1 $ \ sim $ 98和$ \ sim $ 97。我们认为,我们的工作可以显着提高自动行程生成服务的质量。

Although several automatic itinerary generation services have made travel planning easy, often times travellers find themselves in unique situations where they cannot make the best out of their trip. Visitors differ in terms of many factors such as suffering from a disability, being of a particular dietary preference, travelling with a toddler, etc. While most tourist spots are universal, others may not be inclusive for all. In this paper, we focus on the problem of mining inclusion and exclusion phrases associated with 11 such factors, from reviews related to a tourist spot. While existing work on tourism data mining mainly focuses on structured extraction of trip related information, personalized sentiment analysis, and automatic itinerary generation, to the best of our knowledge this is the first work on inclusion/exclusion phrase mining from tourism reviews. Using a dataset of 2000 reviews related to 1000 tourist spots, our broad level classifier provides a binary overlap F1 of $\sim$80 and $\sim$82 to classify a phrase as inclusion or exclusion respectively. Further, our inclusion/exclusion classifier provides an F1 of $\sim$98 and $\sim$97 for 11-class inclusion and exclusion classification respectively. We believe that our work can significantly improve the quality of an automatic itinerary generation service.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源