论文标题
越南工作清单中职业技能检测的实用方法
A practical method for occupational skills detection in Vietnamese job listings
论文作者
论文摘要
越南劳动力市场一直处于不平衡的发展状态。大学毕业生人数正在增加,但失业率也在增加。这种情况通常是由于缺乏准确,及时的劳动力市场信息而引起的,这导致工人供应与实际市场需求之间的技能失误。为了为劳动力市场构建数据监控和分析平台,主要挑战之一是能够自动从与劳动相关数据(例如简历和工作清单)中检测到职业技能。传统方法依靠现有的分类法和/或大型注释数据来构建名为实体识别(NER)模型。它们很昂贵,需要大量的手动努力。在本文中,我们提出了一种在越南工作清单中进行技能检测的实用方法。我们将任务视为NER任务,而是将任务视为排名问题。我们提出了一个管道,其中首先提取短语并在语义相似性与短语的上下文中排名。然后,我们采用最终分类来检测技能短语。我们收集了三个数据集并进行了广泛的实验。结果表明,我们的方法比稀缺数据集中的NER模型取得了更好的性能。
Vietnamese labor market has been under an imbalanced development. The number of university graduates is growing, but so is the unemployment rate. This situation is often caused by the lack of accurate and timely labor market information, which leads to skill miss-matches between worker supply and the actual market demands. To build a data monitoring and analytic platform for the labor market, one of the main challenges is to be able to automatically detect occupational skills from labor-related data, such as resumes and job listings. Traditional approaches rely on existing taxonomy and/or large annotated data to build Named Entity Recognition (NER) models. They are expensive and require huge manual efforts. In this paper, we propose a practical methodology for skill detection in Vietnamese job listings. Rather than viewing the task as a NER task, we consider the task as a ranking problem. We propose a pipeline in which phrases are first extracted and ranked in semantic similarity with the phrases' contexts. Then we employ a final classification to detect skill phrases. We collected three datasets and conducted extensive experiments. The results demonstrated that our methodology achieved better performance than a NER model in scarce datasets.