论文标题
法律文件的指导半监督的非负矩阵分解
Guided Semi-Supervised Non-negative Matrix Factorization on Legal Documents
论文作者
论文摘要
分类和主题建模是机器学习中的流行技术,可从大规模数据集中提取信息。通过合并诸如标签或重要功能之类的先验信息,已经开发了执行分类和主题建模任务的方法;但是,大多数可以执行两者的方法都不允许对主题或功能进行指导。在本文中,我们提出了一种方法,即指导半监督的非负矩阵分解(GSSNMF),该方法通过合并预分配的文档类标签和用户设计的种子单词的监督来执行分类和主题建模。我们通过将其应用于加利福尼亚无辜项目提供的法律文件的应用来测试该方法的性能,这是一家非营利组织,致力于自由定罪的人并改革司法系统。结果表明,与过去的方法相比,我们提出的方法提高了分类准确性和主题连贯性,例如半监督的非负矩阵分解(SSNMF)和引导的非阴性矩阵分解(指导NMF)。
Classification and topic modeling are popular techniques in machine learning that extract information from large-scale datasets. By incorporating a priori information such as labels or important features, methods have been developed to perform classification and topic modeling tasks; however, most methods that can perform both do not allow for guidance of the topics or features. In this paper, we propose a method, namely Guided Semi-Supervised Non-negative Matrix Factorization (GSSNMF), that performs both classification and topic modeling by incorporating supervision from both pre-assigned document class labels and user-designed seed words. We test the performance of this method through its application to legal documents provided by the California Innocence Project, a nonprofit that works to free innocent convicted persons and reform the justice system. The results show that our proposed method improves both classification accuracy and topic coherence in comparison to past methods like Semi-Supervised Non-negative Matrix Factorization (SSNMF) and Guided Non-negative Matrix Factorization (Guided NMF).