论文标题
“ fijo”:法国保险软技能检测数据集
"FIJO": a French Insurance Soft Skill Detection Dataset
论文作者
论文摘要
了解工作需求的演变对于工人,公司和公共组织遵循就业市场的快速转型变得越来越重要。幸运的是,最近的自然语言处理(NLP)方法可以开发方法,可以自动从工作广告中提取信息并更精确地识别技能。但是,这些有效的方法需要从研究的域中进行大量注释数据,这主要是由于知识产权。本文提出了一个新的公共数据集fijo,其中包含保险工作优惠,包括许多软技能注释。要了解该数据集的潜力,我们详细介绍了一些特征和一些局限性。然后,我们使用命名的实体识别方法介绍了技能检测算法的结果,并表明基于变形金刚的模型在此数据集上具有良好的令牌性能。最后,我们分析了我们最佳模型犯的一些错误,以强调应用NLP方法时可能出现的困难。
Understanding the evolution of job requirements is becoming more important for workers, companies and public organizations to follow the fast transformation of the employment market. Fortunately, recent natural language processing (NLP) approaches allow for the development of methods to automatically extract information from job ads and recognize skills more precisely. However, these efficient approaches need a large amount of annotated data from the studied domain which is difficult to access, mainly due to intellectual property. This article proposes a new public dataset, FIJO, containing insurance job offers, including many soft skill annotations. To understand the potential of this dataset, we detail some characteristics and some limitations. Then, we present the results of skill detection algorithms using a named entity recognition approach and show that transformers-based models have good token-wise performances on this dataset. Lastly, we analyze some errors made by our best model to emphasize the difficulties that may arise when applying NLP approaches.