New on the arxiv: “Network reconstruction via the minimum description length... - networkscience

tiago, 1 month ago

New on the arxiv: “Network reconstruction via the minimum description length principle"

https://arxiv.org/abs/2405.010151

Short explainer thread: 1/N

#network #networkscience @networkscience

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ daniel, neurovagrant, stevensanderson, janeadams

Image

Image alternative text

janeadams, 1 month ago

@tiago @networkscience I'm getting a 404 on that link!

Correct link: https://arxiv.org/abs/2405.01015

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

tiago, 1 month ago

@janeadams @networkscience

Fixed, thanks!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

tiago, 1 month ago

We need to reconstruct networks when the edges cannot be obtained directly, only their outcomes via a time series or the static states of the nodes.

Think of epidemic contact tracing, fMRI brain scans, or species abundances in an ecological system.

This is still an open problem, with many heuristic approaches out there, but few principled ones that work well.

As we had argued before, this problem is better framed as one of statistical inference of generative models:

https://nature.com/articles/s41467-022-34267-9

2/N

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

tiago, 1 month ago

Indeed, maybe you are familiar with methods of this type, such GLASSO, which infer a sparse multivariate Gaussian from node features.

A central problem here is regularization: how do we prevent overfitting, by not detecting edges which are not really there?

GLASSO (and a bunch of other ML methods) use L1 regularization for this, which is frustrating, since it requires the desired network sparsity as an input.

But we want that as an output!

So people resort to cross-validation for this.

And the results are not good

The shrinkage bias caused by L1 and cross-validation do not mix well, forcing an unnecessary trade-off between promoting sparsity and reducing bias. In the end, your reconstructed network ends up with a bunch of fake edges. 👎

Besides being slow and annoying to use...

3/N

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

tiago, 1 month ago

We fix this by doing regularization the right way™: We evaluate the model complexity via its description length, i.e. by minimizing the amount of information required to encode both the data and the model parameters.

Our priors are based on quantization, not shrinkage!

The results are systematically more accurate and faster than L1, since we need to fit only once, and do not need cross validation at all.

Our approach is also nonparametric: we learn the weight distribution, instead of imposing it a priori.

4/N

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

tiago, 1 month ago

Our MDL approach can be combined with a new algorithm that performs network reconstruction in subquatratic time, and can be implemented in parallel:

https://arxiv.org/abs/2401.01404

This means we can tackle really large networks! 5/N

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

tiago, 1 month ago

We showcase this with the reconstruction of two large microbial interaction networks, from the human and earth microbiome projects. These involve more than 10⁵ species!

Have you seen reconstructed networks of this magnitude before? I haven't.

6/N

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ franco_vazza

Add comment