论文标题
寻求对移动接口的语义理解
Towards Better Semantic Understanding of Mobile Interfaces
论文作者
论文摘要
提高移动设备的可访问性和自动化功能可能会对无数用户的日常生活产生重大积极影响。为了刺激这一方向的研究,我们释放了一个具有大约500K独特注释的人类通知数据集,旨在增加对UI元素功能的理解。该数据集增加了图像并查看RICO的层次结构,RICO是一个大的移动UI数据集,基于图标的注释,基于它们的形状和语义,以及不同元素及其相应文本标签之间的关联,从而大大增加了UI元素的数量以及分配给它们的类别。我们还使用仅图像和多模式输入发布模型;我们尝试各种架构,并研究在新数据集上使用多模式输入的好处。我们的模型在一组看不见的应用程序中表现出强烈的性能,表明它们对新屏幕的推广性。这些模型与新数据集结合使用,可以启用创新功能,例如通过标签来指代UI元素,改进的覆盖范围和图标的更好的语义等,这将使UIS更有用。
Improving the accessibility and automation capabilities of mobile devices can have a significant positive impact on the daily lives of countless users. To stimulate research in this direction, we release a human-annotated dataset with approximately 500k unique annotations aimed at increasing the understanding of the functionality of UI elements. This dataset augments images and view hierarchies from RICO, a large dataset of mobile UIs, with annotations for icons based on their shapes and semantics, and associations between different elements and their corresponding text labels, resulting in a significant increase in the number of UI elements and the categories assigned to them. We also release models using image-only and multimodal inputs; we experiment with various architectures and study the benefits of using multimodal inputs on the new dataset. Our models demonstrate strong performance on an evaluation set of unseen apps, indicating their generalizability to newer screens. These models, combined with the new dataset, can enable innovative functionalities like referring to UI elements by their labels, improved coverage and better semantics for icons etc., which would go a long way in making UIs more usable for everyone.