域代表关键字选择：一种概率方法

论文标题

域代表关键字选择：一种概率方法

Domain Representative Keywords Selection: A Probabilistic Approach

论文作者

Akash, Pritom Saha, Huang, Jie, Chang, Kevin Chen-Chuan, Li, Yunyao, Popa, Lucian, Zhai, ChengXiang

论文摘要

我们提出了一种概率方法，以从候选人集中选择\ textit {target域代表关键字}的子集，与上下文域对比。这样的任务对于自然语言处理中的许多下游任务至关重要。为了对比目标域和上下文域，我们调整\ textIt {TwipIt {Twicort {两部分混合模型}概念以生成候选关键字的分布。与与上下文域相反的常见关键字，它为目标域的\ textit {独特的}关键字提供了更重要的重要性。为了支持所选关键字对目标域的\ textIt {代表性}，我们引入了\ textit {优化算法}，以从生成的候选分布中选择子集。我们已经表明，可以通过几乎最佳的近似保证有效地实现优化算法。最后，在多个领域进行的广泛实验证明了我们的方法优于其他基准，对于关键字摘要生成和趋势关键字选择的任务。

We propose a probabilistic approach to select a subset of a \textit{target domain representative keywords} from a candidate set, contrasting with a context domain. Such a task is crucial for many downstream tasks in natural language processing. To contrast the target domain and the context domain, we adapt the \textit{two-component mixture model} concept to generate a distribution of candidate keywords. It provides more importance to the \textit{distinctive} keywords of the target domain than common keywords contrasting with the context domain. To support the \textit{representativeness} of the selected keywords towards the target domain, we introduce an \textit{optimization algorithm} for selecting the subset from the generated candidate distribution. We have shown that the optimization algorithm can be efficiently implemented with a near-optimal approximation guarantee. Finally, extensive experiments on multiple domains demonstrate the superiority of our approach over other baselines for the tasks of keyword summary generation and trending keywords selection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题