Indeed, maybe you are familiar with methods of this type, such GLASSO, which infer a sparse multivariate Gaussian from node features.
A central problem here is regularization: how do we prevent overfitting, by not detecting edges which are not really there?
GLASSO (and a bunch of other ML methods) use L1 regularization for this, which is frustrating, since it requires the desired network sparsity as an input.
But we want that as an output!
So people resort to cross-validation for this.
And the results are not good
The shrinkage bias caused by L1 and cross-validation do not mix well, forcing an unnecessary trade-off between promoting sparsity and reducing bias. In the end, your reconstructed network ends up with a bunch of fake edges. 👎
We fix this by doing regularization the right way™: We evaluate the model complexity via its description length, i.e. by minimizing the amount of information required to encode both the data and the model parameters.
Our priors are based on quantization, not shrinkage!
The results are systematically more accurate and faster than L1, since we need to fit only once, and do not need cross validation at all.
Our approach is also nonparametric: we learn the weight distribution, instead of imposing it a priori.
We showcase this with the reconstruction of two large microbial interaction networks, from the human and earth microbiome projects. These involve more than 10⁵ species!
Have you seen reconstructed networks of this magnitude before? I haven't.
Add comment