论文标题
在地下论坛上迈向黑暗术语解释
Towards Dark Jargon Interpretation in Underground Forums
论文作者
论文摘要
深色术语是看上去良性的词,具有隐藏的,险恶的含义,并被地下论坛的参与者用于非法行为。例如,深色术语“大鼠”通常用于代替“远程访问特洛伊木马”。在这项工作中,我们提出了一种自动识别和解释深色术语的新方法。我们将问题形式化为从暗词到“清洁”单词的映射,没有隐藏的含义。我们的方法利用了以共享词汇量的概率分布形式的黑暗和干净单词的可解释表示。在我们的实验中,我们证明了我们的方法在暗术语识别方面有效,因为它表现出了模拟数据的另一种相关方法。使用手动评估,我们表明我们的方法能够在现实世界地下论坛数据集中检测到黑术语。
Dark jargons are benign-looking words that have hidden, sinister meanings and are used by participants of underground forums for illicit behavior. For example, the dark term "rat" is often used in lieu of "Remote Access Trojan". In this work we present a novel method towards automatically identifying and interpreting dark jargons. We formalize the problem as a mapping from dark words to "clean" words with no hidden meaning. Our method makes use of interpretable representations of dark and clean words in the form of probability distributions over a shared vocabulary. In our experiments we show our method to be effective in terms of dark jargon identification, as it outperforms another related method on simulated data. Using manual evaluation, we show that our method is able to detect dark jargons in a real-world underground forum dataset.