tiago,
@tiago@social.skewed.de avatar

New on the arxiv: “Network reconstruction via the minimum description length principle"

https://arxiv.org/abs/2405.010151

Short explainer thread: 1/N

@networkscience

janeadams,
@janeadams@vis.social avatar

@tiago @networkscience I'm getting a 404 on that link!

Correct link: https://arxiv.org/abs/2405.01015

tiago,
@tiago@social.skewed.de avatar

@janeadams @networkscience

Fixed, thanks!

tiago,
@tiago@social.skewed.de avatar

We need to reconstruct networks when the edges cannot be obtained directly, only their outcomes via a time series or the static states of the nodes.

Think of epidemic contact tracing, fMRI brain scans, or species abundances in an ecological system.

This is still an open problem, with many heuristic approaches out there, but few principled ones that work well.

As we had argued before, this problem is better framed as one of statistical inference of generative models:

https://nature.com/articles/s41467-022-34267-9

2/N

tiago,
@tiago@social.skewed.de avatar

Indeed, maybe you are familiar with methods of this type, such GLASSO, which infer a sparse multivariate Gaussian from node features.

A central problem here is regularization: how do we prevent overfitting, by not detecting edges which are not really there?

GLASSO (and a bunch of other ML methods) use L1 regularization for this, which is frustrating, since it requires the desired network sparsity as an input.

But we want that as an output!

So people resort to cross-validation for this.

And the results are not good

The shrinkage bias caused by L1 and cross-validation do not mix well, forcing an unnecessary trade-off between promoting sparsity and reducing bias. In the end, your reconstructed network ends up with a bunch of fake edges. 👎

Besides being slow and annoying to use...

3/N

tiago,
@tiago@social.skewed.de avatar

We fix this by doing regularization the right way™: We evaluate the model complexity via its description length, i.e. by minimizing the amount of information required to encode both the data and the model parameters.

Our priors are based on quantization, not shrinkage!

The results are systematically more accurate and faster than L1, since we need to fit only once, and do not need cross validation at all.

Our approach is also nonparametric: we learn the weight distribution, instead of imposing it a priori.

4/N

tiago,
@tiago@social.skewed.de avatar

Our MDL approach can be combined with a new algorithm that performs network reconstruction in subquatratic time, and can be implemented in parallel:

https://arxiv.org/abs/2401.01404

This means we can tackle really large networks! 5/N

tiago,
@tiago@social.skewed.de avatar

We showcase this with the reconstruction of two large microbial interaction networks, from the human and earth microbiome projects. These involve more than 10⁵ species!

Have you seen reconstructed networks of this magnitude before? I haven't.

6/N

  • All
  • Subscribed
  • Moderated
  • Favorites
  • networkscience@a.gup.pe
  • DreamBathrooms
  • ngwrru68w68
  • tester
  • magazineikmin
  • thenastyranch
  • rosin
  • khanakhh
  • InstantRegret
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • mdbf
  • tacticalgear
  • JUstTest
  • osvaldo12
  • normalnudes
  • cubers
  • cisconetworking
  • everett
  • GTA5RPClips
  • ethstaker
  • Leos
  • provamag3
  • anitta
  • modclub
  • megavids
  • lostlight
  • All magazines