相关高斯来源的计算相似性查询

论文标题

Computing Similarity Queries for Correlated Gaussian Sources

论文作者

Wu, Hanwei, Wang, Qiwen, Flierl, Markus

论文摘要

在许多当前的数据处理系统中，目标通常不是数据的再现，而是根据查询产生的数据来计算一些答案。相似性标识任务是识别数据库中类似于给定指标的给定查询项目的项目。在ARXIV中研究了相似性识别的压缩问题：1307.6609 [CS.IT]。与经典的压缩问题不同，重点不是重建原始数据。相反，压缩率取决于答案的理想可靠性。具体而言，信息度量识别率是在所有方案中都可以达到的最低率的特征，这些计划可以保证相对于给定相似性阈值可靠的答案。在本文中，我们提出了一个基于组件的模型，用于计算相关性查询。相关信号首先是由KLT变换解压缩的。然后，通过针对每个组件的独特的D-加入系统处理去相关的信号。我们表明，当应用最佳速率相似性分配时，配备了KLT的基于组件的模型可以完美地表示多元高斯相似性查询。因此，我们可以根据基于组件的模型得出多元高斯信号的识别率。然后，我们将结果扩展到带有内存的Gener Gaussian源。我们还研究了配备了实用综合\ nt系统的模型。我们使用TC-$ \三角$方案，该方案使用覆盖签名和三角形质量决策规则的类型作为组件系统。我们提出了一种迭代方法，以数值近似于TC-$ \三角形$方案的最低可实现率。我们表明，配备TC-$ \ Triangle $方案的基于组件的模型可以比TC-$ \ Triangle $方案获得更好的性能。

Among many current data processing systems, the objectives are often not the reproduction of data, but to compute some answers based on the data resulting from queries. The similarity identification task is to identify the items in a database that are similar to a given query item for a given metric. The problem of compression for similarity identification has been studied in arXiv:1307.6609 [cs.IT]. Unlike classical compression problems, the focus is not on reconstructing the original data. Instead, the compression rate is determined by the desired reliability of the answers. Specifically, the information measure identification rate characterizes the minimum rate that can be achieved among all schemes which guarantee reliable answers with respect to a given similarity threshold. In this paper, we propose a component-based model for computing correlated similarity queries. The correlated signals are first decorrelated by the KLT transform. Then, the decorrelated signal is processed by a distinct D-admissible system for each component. We show that the component-based model equipped with KLT can perfectly represent the multivariate Gaussian similarity queries when optimal rate-similarity allocation applies. Hence, we can derive the identification rate of the multivariate Gaussian signals based on the component-based model. We then extend the result to general Gaussian sources with memory. We also study the models equipped with practical compone\nt systems. We use TC-$\triangle$ schemes that use type covering signatures and triangle-inequality decision rules as our component systems. We propose an iterative method to numerically approximate the minimum achievable rate of the TC-$\triangle$ scheme. We show that our component-based model equipped with TC-$\triangle$ schemes can achieve better performance than the TC-$\triangle$ scheme unaided on handling the multivariate Gaussian sources.

下载PDF全文

下载文献需遵守相关版权规定

论文标题