论文标题
使用实体分辨率和上下文嵌入的自动元数据协调
Automated Metadata Harmonization Using Entity Resolution & Contextual Embedding
论文作者
论文摘要
ML数据策展过程通常由具有多种模式结构的异质和联合源系统组成。需要策展过程,以将元数据从不同的模式到可行的模式进行标准化。元数据协调和分类的手动过程减慢了ML-OPS生命周期的效率。我们通过使用实体解决方法的帮助来证明这一步骤的自动化,并通过使用Cogntive数据库的DB2VEC嵌入方法来捕获隐藏的柱间和列内关系,从而检测元数据的相似性,然后预测从源模式到任何标准化模式的元数据柱。除了匹配模式外,我们还可以证明它也可以推断目标数据模型的正确本体论结构。
ML Data Curation process typically consist of heterogeneous & federated source systems with varied schema structures; requiring curation process to standardize metadata from different schemas to an inter-operable schema. This manual process of Metadata Harmonization & cataloging slows efficiency of ML-Ops lifecycle. We demonstrate automation of this step with the help of entity resolution methods & also by using Cogntive Database's Db2Vec embedding approach to capture hidden inter-column & intra-column relationships which detect similarity of metadata and then predict metadata columns from source schemas to any standardized schemas. Apart from matching schemas, we demonstrate that it can also infer the correct ontological structure of the target data model.