通过电子商务中的基准数据集使用常识知识显着性评估

论文标题

通过电子商务中的基准数据集使用常识知识显着性评估

Commonsense Knowledge Salience Evaluation with a Benchmark Dataset in E-commerce

论文作者

Qu, Yincen, Zhang, Ningyu, Chen, Hui, Dai, Zelin, Xu, Zezhong, Wang, Chengming, Wang, Xiaoyu, Chen, Qiang, Chen, Huajun

论文摘要

在电子商务中，常识知识的显着性（CSK）对广泛的应用程序（例如产品搜索和建议）有益。例如，当用户在电子商务中搜索``运行''时，他们希望找到与跑步高度相关的产品，例如``跑鞋''而不是``鞋子''。然而，许多现有的CSK集合仅根据置信度的评分进行排名，并且从人类的角度来看，没有关于哪些是显着的信息。在这项工作中，我们定义了监督显着性评估的任务，在给定CSK三倍的情况下，需要该模型来了解三倍是否显着。除了制定新任务外，我们还发布了电子商务（BSEE）的显着评估基准数据集，并希望促进有关常识性知识显着性评估的相关研究。我们使用几种代表性基线模型在数据集中进行实验。实验结果表明，显着性评估是一项具有挑战性的任务，在我们的评估集中，模型的表现不佳。我们进一步提出了一种简单但有效的方法，即PMI调整，该方法显示了解决这一新问题的希望。代码可在\ url {https://github.com/openbgbenchmark/openbg-csk中找到。

In e-commerce, the salience of commonsense knowledge (CSK) is beneficial for widespread applications such as product search and recommendation. For example, when users search for ``running'' in e-commerce, they would like to find products highly related to running, such as ``running shoes'' rather than ``shoes''. Nevertheless, many existing CSK collections rank statements solely by confidence scores, and there is no information about which ones are salient from a human perspective. In this work, we define the task of supervised salience evaluation, where given a CSK triple, the model is required to learn whether the triple is salient or not. In addition to formulating the new task, we also release a new Benchmark dataset of Salience Evaluation in E-commerce (BSEE) and hope to promote related research on commonsense knowledge salience evaluation. We conduct experiments in the dataset with several representative baseline models. The experimental results show that salience evaluation is a challenging task where models perform poorly on our evaluation set. We further propose a simple but effective approach, PMI-tuning, which shows promise for solving this novel problem. Code is available in \url{https://github.com/OpenBGBenchmark/OpenBG-CSK.

下载PDF全文

下载文献需遵守相关版权规定

论文标题