论文标题
将层次结构纳入文本编码器:一种用于层次文本分类的对比学习方法
Incorporating Hierarchy into Text Encoder: a Contrastive Learning Approach for Hierarchical Text Classification
论文作者
论文摘要
由于其复杂的标签层次结构,分层文本分类是多标签分类的具有挑战性的子任务。现有方法分别编码文本和标签层次结构,并将其表示为分类,其中层次结构在所有输入文本中保持不变。我们建议在这项工作中对它们进行建模,而是建议层次结构引导的对比度学习(HGCLR)将层次结构直接嵌入文本编码器中。在训练过程中,HGCLR在标签层次结构的指导下构建了输入文本的阳性样本。通过将输入文本及其正面示例汇总在一起,文本编码器可以学会独立生成层次结构感知文本表示形式。因此,经过训练,HGCLR增强的文本编码器可以分配冗余层次结构。在三个基准数据集上进行的广泛实验验证了HGCLR的有效性。
Hierarchical text classification is a challenging subtask of multi-label classification due to its complex label hierarchy. Existing methods encode text and label hierarchy separately and mix their representations for classification, where the hierarchy remains unchanged for all input text. Instead of modeling them separately, in this work, we propose Hierarchy-guided Contrastive Learning (HGCLR) to directly embed the hierarchy into a text encoder. During training, HGCLR constructs positive samples for input text under the guidance of the label hierarchy. By pulling together the input text and its positive sample, the text encoder can learn to generate the hierarchy-aware text representation independently. Therefore, after training, the HGCLR enhanced text encoder can dispense with the redundant hierarchy. Extensive experiments on three benchmark datasets verify the effectiveness of HGCLR.