论文标题
场景图的概率偏见
Probabilistic Debiasing of Scene Graphs
论文作者
论文摘要
最新模型(SOTA)产生的场景图的质量由于关系的长尾性质及其父对象对而受到损害。场景图的训练由多数对的多数关系主导,因此,在收集训练后,未保留少数族裔关系中关系的对象条件分布。因此,偏见的模型在关系的边际分布(例如“ on”和“佩戴”)的边际分布中表现良好,并且在诸如“饮食”或“悬挂”之类的较不频繁的关系上表现较差。在这项工作中,我们提出了虚拟证据纳入了三个贝叶斯网络(BN),以保留关系标签的对象条件分布并消除关系的边际概率所产生的偏见。少数群体中的关系数量不足为学习三个贝叶斯网络内的网络带来了一个重大问题。我们通过嵌入三胞胎的基础增强来解决这种不足,在该增强中,我们可以从邻里三胞胎中的少数族裔三胞胎课程中的样本中的语义空间中的样本。我们在两个不同的数据集上执行实验,并在关系的平均召回中取得了重大改进。与场景图模型的SOTA偏差技术相比,我们还可以在召回和平均召回性能之间取得更好的平衡。
The quality of scene graphs generated by the state-of-the-art (SOTA) models is compromised due to the long-tail nature of the relationships and their parent object pairs. Training of the scene graphs is dominated by the majority relationships of the majority pairs and, therefore, the object-conditional distributions of relationship in the minority pairs are not preserved after the training is converged. Consequently, the biased model performs well on more frequent relationships in the marginal distribution of relationships such as `on' and `wearing', and performs poorly on the less frequent relationships such as `eating' or `hanging from'. In this work, we propose virtual evidence incorporated within-triplet Bayesian Network (BN) to preserve the object-conditional distribution of the relationship label and to eradicate the bias created by the marginal probability of the relationships. The insufficient number of relationships in the minority classes poses a significant problem in learning the within-triplet Bayesian network. We address this insufficiency by embedding-based augmentation of triplets where we borrow samples of the minority triplet classes from its neighborhood triplets in the semantic space. We perform experiments on two different datasets and achieve a significant improvement in the mean recall of the relationships. We also achieve better balance between recall and mean recall performance compared to the SOTA de-biasing techniques of scene graph models.