来自截短样品的稀疏图形模型的有效统计数据

论文标题

来自截短样品的稀疏图形模型的有效统计数据

Efficient Statistics for Sparse Graphical Models from Truncated Samples

论文作者

Bhattacharyya, Arnab, Desai, Rathin, Nagarajan, Sai Ganesh, Panageas, Ioannis

论文摘要

在本文中，我们研究了截短样品的高维估计。我们专注于两个基本和经典问题：（i）稀疏高斯图形模型的推断，（ii）支持稀疏线性模型的恢复。（i）对于高斯图形模型，假设$ d $ - 维样品$ {\ bf x} $是从高斯$ n（μ，σ）$生成的，并且仅当它们属于子集$ s \ subseteq \ subseteq \ subseteq \ mathbb {r}^d $时才观察到。我们表明，使用$ \ tilde {o} \ left（\ frac {\ frac {\ textrm {nz} {σ^{ - 1}} {ε^2} {ε^2} \ spample $和n the Hepance，$ $ $ $和$σ$可以在Frobenius Norm中估算出错误的$ε$，并使用$ \ tilde {o} \ left（\ frac {\ frac {\ textrm {nz}（σ^{ - 1}）} {ε^2} \ right）$ space $ and splace（ $ s $的会员甲骨文。假定集合$ S $在未知分布下具有非平凡的度量，但否则是任意的。（ii）对于稀疏线性回归，假设样品$（{\ bf x}，y）$是生成$ y = {\ bf x}^\ top {ω^*} + \ mathcal {n}（n}（0,1）$和$（{\ bf x}，yunc的$ s的$ s的c， \ mathbb {r} $。我们认为，$ω^*$稀疏，尺寸$ k $很少。我们的主要结果是在问题尺寸$ d $，支持尺寸$ k $，观测值$ n $以及样本的属性和截断的属性上建立精确条件，足以恢复$ω^*$的支撑。具体来说，我们表明，在某些温和的假设下，只需要$ o（k^2 \ log d）$样本才能估算$ \ ell_ \ elfty $ -norm中的$ω^*$，直到有界错误。对于这两个问题，我们的估计器可最大程度地降低有限总体负模样函数的总和和$ \ ell_1 $ regolarization项。

In this paper, we study high-dimensional estimation from truncated samples. We focus on two fundamental and classical problems: (i) inference of sparse Gaussian graphical models and (ii) support recovery of sparse linear models. (i) For Gaussian graphical models, suppose $d$-dimensional samples ${\bf x}$ are generated from a Gaussian $N(μ,Σ)$ and observed only if they belong to a subset $S \subseteq \mathbb{R}^d$. We show that $μ$ and $Σ$ can be estimated with error $ε$ in the Frobenius norm, using $\tilde{O}\left(\frac{\textrm{nz}(Σ^{-1})}{ε^2}\right)$ samples from a truncated $\mathcal{N}(μ,Σ)$ and having access to a membership oracle for $S$. The set $S$ is assumed to have non-trivial measure under the unknown distribution but is otherwise arbitrary. (ii) For sparse linear regression, suppose samples $({\bf x},y)$ are generated where $y = {\bf x}^\top{Ω^*} + \mathcal{N}(0,1)$ and $({\bf x}, y)$ is seen only if $y$ belongs to a truncation set $S \subseteq \mathbb{R}$. We consider the case that $Ω^*$ is sparse with a support set of size $k$. Our main result is to establish precise conditions on the problem dimension $d$, the support size $k$, the number of observations $n$, and properties of the samples and the truncation that are sufficient to recover the support of $Ω^*$. Specifically, we show that under some mild assumptions, only $O(k^2 \log d)$ samples are needed to estimate $Ω^*$ in the $\ell_\infty$-norm up to a bounded error. For both problems, our estimator minimizes the sum of the finite population negative log-likelihood function and an $\ell_1$-regularization term.

下载PDF全文

下载文献需遵守相关版权规定

论文标题