论文标题
Semeval-2022任务8:带有变压器的功能提取管道,用于多语言新闻文章的相似性
Wolfies at SemEval-2022 Task 8: Feature extraction pipeline with transformers for Multi-lingual news article similarity
论文作者
论文摘要
这项工作是要找到一对新闻文章之间的相似性。数据集中为每对提供了七个不同的客观相似性指标,新闻文章中有多种不同的语言。除了预先训练的嵌入模型之外,我们计算了基线结果的余弦相似性,然后在其上训练了前馈神经网络以改善结果。我们还为每个相似度度量的指标构建了单独的管道,以提取特征。使用特征提取和前馈神经网络,我们可以看到基线结果的显着改善。
This work is about finding the similarity between a pair of news articles. There are seven different objective similarity metrics provided in the dataset for each pair and the news articles are in multiple different languages. On top of the pre-trained embedding model, we calculated cosine similarity for baseline results and feed-forward neural network was then trained on top of it to improve the results. We also built separate pipelines for each similarity metric for feature extraction. We could see significant improvement from baseline results using feature extraction and feed-forward neural network.