论文标题
域代表关键字选择:一种概率方法
Domain Representative Keywords Selection: A Probabilistic Approach
论文作者
论文摘要
我们提出了一种概率方法,以从候选人集中选择\ textit {target域代表关键字}的子集,与上下文域对比。这样的任务对于自然语言处理中的许多下游任务至关重要。为了对比目标域和上下文域,我们调整\ textIt {TwipIt {Twicort {两部分混合模型}概念以生成候选关键字的分布。与与上下文域相反的常见关键字,它为目标域的\ textit {独特的}关键字提供了更重要的重要性。为了支持所选关键字对目标域的\ textIt {代表性},我们引入了\ textit {优化算法},以从生成的候选分布中选择子集。我们已经表明,可以通过几乎最佳的近似保证有效地实现优化算法。最后,在多个领域进行的广泛实验证明了我们的方法优于其他基准,对于关键字摘要生成和趋势关键字选择的任务。
We propose a probabilistic approach to select a subset of a \textit{target domain representative keywords} from a candidate set, contrasting with a context domain. Such a task is crucial for many downstream tasks in natural language processing. To contrast the target domain and the context domain, we adapt the \textit{two-component mixture model} concept to generate a distribution of candidate keywords. It provides more importance to the \textit{distinctive} keywords of the target domain than common keywords contrasting with the context domain. To support the \textit{representativeness} of the selected keywords towards the target domain, we introduce an \textit{optimization algorithm} for selecting the subset from the generated candidate distribution. We have shown that the optimization algorithm can be efficiently implemented with a near-optimal approximation guarantee. Finally, extensive experiments on multiple domains demonstrate the superiority of our approach over other baselines for the tasks of keyword summary generation and trending keywords selection.