论文标题
在线校准的匪徒:社交媒体平台上内容审核的应用
Bandits for Online Calibration: An Application to Content Moderation on Social Media Platforms
论文作者
论文摘要
我们描述了META采用的当前内容审核策略,以从其平台中删除政策侵略内容。 Meta依靠手工制作和学习的风险模型来标记可能违反内容进行人类审查的内容。我们的方法将这些风险模型汇总为单个排名评分,对它们进行校准,以优先考虑更可靠的风险模型。一个关键的挑战是,违规趋势会随着时间的流逝而变化,从而影响了哪种风险模型最可靠。我们的系统还处理生产挑战,例如改变风险模型和新型风险模型。我们使用上下文强盗来响应这种趋势来更新校准。我们的方法增加了META的一流指标,以衡量其内容审核策略的有效性13%。
We describe the current content moderation strategy employed by Meta to remove policy-violating content from its platforms. Meta relies on both handcrafted and learned risk models to flag potentially violating content for human review. Our approach aggregates these risk models into a single ranking score, calibrating them to prioritize more reliable risk models. A key challenge is that violation trends change over time, affecting which risk models are most reliable. Our system additionally handles production challenges such as changing risk models and novel risk models. We use a contextual bandit to update the calibration in response to such trends. Our approach increases Meta's top-line metric for measuring the effectiveness of its content moderation strategy by 13%.