论文标题
来自截短样品的稀疏图形模型的有效统计数据
Efficient Statistics for Sparse Graphical Models from Truncated Samples
论文作者
论文摘要
在本文中,我们研究了截短样品的高维估计。我们专注于两个基本和经典问题:(i)稀疏高斯图形模型的推断,(ii)支持稀疏线性模型的恢复。 (i)对于高斯图形模型,假设$ d $ - 维样品$ {\ bf x} $是从高斯$ n(μ,σ)$生成的,并且仅当它们属于子集$ s \ subseteq \ subseteq \ subseteq \ mathbb {r}^d $时才观察到。我们表明,使用$ \ tilde {o} \ left(\ frac {\ frac {\ textrm {nz} {σ^{ - 1}} {ε^2} {ε^2} \ spample $和n the Hepance,$ $ $ $和$σ$可以在Frobenius Norm中估算出错误的$ε$,并使用$ \ tilde {o} \ left(\ frac {\ frac {\ textrm {nz}(σ^{ - 1})} {ε^2} \ right)$ space $ and splace( $ s $的会员甲骨文。假定集合$ S $在未知分布下具有非平凡的度量,但否则是任意的。 (ii)对于稀疏线性回归,假设样品$({\ bf x},y)$是生成$ y = {\ bf x}^\ top {ω^*} + \ mathcal {n}(n}(0,1)$和$({\ bf x},yunc的$ s的$ s的c, \ mathbb {r} $。我们认为,$ω^*$稀疏,尺寸$ k $很少。我们的主要结果是在问题尺寸$ d $,支持尺寸$ k $,观测值$ n $以及样本的属性和截断的属性上建立精确条件,足以恢复$ω^*$的支撑。具体来说,我们表明,在某些温和的假设下,只需要$ o(k^2 \ log d)$样本才能估算$ \ ell_ \ elfty $ -norm中的$ω^*$,直到有界错误。 对于这两个问题,我们的估计器可最大程度地降低有限总体负模样函数的总和和$ \ ell_1 $ regolarization项。
In this paper, we study high-dimensional estimation from truncated samples. We focus on two fundamental and classical problems: (i) inference of sparse Gaussian graphical models and (ii) support recovery of sparse linear models. (i) For Gaussian graphical models, suppose $d$-dimensional samples ${\bf x}$ are generated from a Gaussian $N(μ,Σ)$ and observed only if they belong to a subset $S \subseteq \mathbb{R}^d$. We show that $μ$ and $Σ$ can be estimated with error $ε$ in the Frobenius norm, using $\tilde{O}\left(\frac{\textrm{nz}(Σ^{-1})}{ε^2}\right)$ samples from a truncated $\mathcal{N}(μ,Σ)$ and having access to a membership oracle for $S$. The set $S$ is assumed to have non-trivial measure under the unknown distribution but is otherwise arbitrary. (ii) For sparse linear regression, suppose samples $({\bf x},y)$ are generated where $y = {\bf x}^\top{Ω^*} + \mathcal{N}(0,1)$ and $({\bf x}, y)$ is seen only if $y$ belongs to a truncation set $S \subseteq \mathbb{R}$. We consider the case that $Ω^*$ is sparse with a support set of size $k$. Our main result is to establish precise conditions on the problem dimension $d$, the support size $k$, the number of observations $n$, and properties of the samples and the truncation that are sufficient to recover the support of $Ω^*$. Specifically, we show that under some mild assumptions, only $O(k^2 \log d)$ samples are needed to estimate $Ω^*$ in the $\ell_\infty$-norm up to a bounded error. For both problems, our estimator minimizes the sum of the finite population negative log-likelihood function and an $\ell_1$-regularization term.