A somewhat contrived question. I want to know if we can generalize or extend the... - Machine Learning

skarthik, 8 months ago

A somewhat contrived question. I want to know if we can generalize or extend the idea of model complexity used in machine learning, statistical learning theory, and deep nets. Alternately, what are its limits? (Indirectly, I want to tease apart what either “model” or “complexity” means in this worldview. They do not seem to be “models” that one uses in science typically nor their measure of complexity seem to line up with any scientific model).

In ML, we typically use, “the number of free parameters” as proxy for model complexity, i.e., weights and biases etc., then penalize for over-parameterization with some information criteria (AIC, BIC etc., etc.,), or regularize (with dropout etc.,) to arrive at bias-variance tradeoff curves. Then there is the double descent (Belkin et al.,) for deep-nets that seems to buck the trend of standard ML (kernel) methods.

My question: if we are to turn this approach to scientific models, how can we evaluate their complexity in the same machine learning framework (yes, I know this sounds bizarre)?

For example, take any simple physics model, as explanatory as it is, it is also predictive. If I were to treat it as a purely predictive model, and try to plot the bias-variance curve, what would constitute its complexity (x-axis), and where/when does its error start going up (y-axis, so the model breaks)? If we can’t do this ML-ification of a simple physics model, why not?

Corollary: Take some nonlinear dynamical system that is the model of some physical phenomenon, how do you define its model complexity? (Are the three parameters of a Lorentz attractor the model complexity of the system in ML parlance?)

#MachineLearning #Science #Models #ModelComplexity #BiasVariance

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Image

Image alternative text

johncarlosbaez, 8 months ago

@skarthik - "Solomonoff induction" says more complex hypotheses have a lower prior probability of being correct, where complexity is measured using algorithmic information theory. There are beautiful results about it, but I haven't seen much practical application yet.

A good intro:

http://www.scholarpedia.org/article/Algorithmic_probability

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bifouba, 8 months ago

@johncarlosbaez @skarthik That takes me back down memory lane: some twenty years ago I wrote a master's thesis on Kolmogorov complexity (a very closely related issue). If I may lean out of the window a bit (I did not end up pursuing a career as a mathematician and may well end up being corrected by John Carlos): the "simplicity" is a bit of a red herring, because ultimately it's about the fact that there are only countably many computable models and given any enumeration of all (computable and therefore all relevant) models, we can set up an induction scheme that offers certain nice guaranteed properties. But the actual order doesn't really matter (up to a constant but arbitrarily large difference in search time). So “simpler" ultimately just means “earlier in my list of all models”, it's not an intrinsic property. What's simple in my world (specifically, with respect to my universal turing machine) need not be simple to you and vice versa. The restriction to computable models is interesting, because one thing it does is to make the notion of “number of parameters" pretty much irrelevant. If we're talking about parameters that are allowed to take on irrational values, then we're deluding ourselves, because then we can never actually work out what the model says (to infinite precision). If we're limiting ourselves to rational values, then we can plunk the information they contain into a single rational number. Anyhow, in terms of practical applications, the underlying ideas pop up in the so-called “Minimum Description Length" and “Minimum Message Length" algorithms, so those might be useful to search for.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

johncarlosbaez, 8 months ago

@bifouba @skarthik

"What's simple in my world (specifically, with respect to my universal Turing machine) need not be simple to you and vice versa."

That's true to some extent, but the big theorem is that if K(X) is the Kolmogorov complexity of a string X using my universal Turing machine and K'(X) is its Kolmogorov complexity using yours, there's a constant C such that

|K(X) - K'(X)| < C

for all X. This is the "invariance theorem" described here:

https://en.wikipedia.org/wiki/Kolmogorov_complexity#Invariance_theorem

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bifouba, 8 months ago

@johncarlosbaez @skarthik Of course (and for anyone reading this to learn more about the topic: the intuition is that my machine can simulate your machine with a simulation program of finite length), but that's a statement about pairs of UTMs, not pairs of individual models, and - correct me if I'm wrong - it's not enough to allow us to establish that a given model A is somehow intrinsically/naturally/canonically simpler than a given model B. The induction results hold for any old enumeration of all computable models, no? Indeed, one of my favourite “applications" of this general approach is an algorithm that is guaranteed to correctly identify any real number as being rational or irrational after having seen a finite segment (although of course we have no way of knowing whether we've already seen enough or not). This uses countability pure and simple without any reference to complexity.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

johncarlosbaez, 8 months ago

@bifouba -

"but that's a statement about pairs of UTMs, not pairs of individual models, and - correct me if I'm wrong - it's not enough to allow us to establish that a given model A is somehow intrinsically/naturally/canonically simpler than a given model B. "

No, you can't expect any way to do that, since what counts as "simpler" depends to some extent on your framework. But the invariance theorem I mentioned implies there's some number C such that regardless of our choices of Turing machine, if I say model A has complexity less than model B, so do you if for me the complexity of model B minus the complexity of model A is exceeds 2C.

@skarthik

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bifouba, 8 months ago

@johncarlosbaez @skarthik Sure, but then we're in agreement that (a) complexity is not an intrinsic property of a given model and (b) that anyhow such an intrinsic notion of complexity is not required to make any of this work.

That's all I was trying to point out: imposing (an arbitrary) measure of complexity allows you to enumerate models in order of complexity and THAT enumeration is what allows the universal induction to work. And if you arrive at such an enumeration by other means, or falls from the sky, it doesn't actually matter whether you relate it to complexity or not.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

johncarlosbaez, 8 months ago (edited 8 months ago)

@bifouba @skarthik

"Sure, but then we're in agreement that (a) complexity is not an intrinsic property of a given model."

I wouldn't go that far: I don't think of it a yes-or-no issue. I've shown that we can all agree on complexity up to some disagreement that's at most some constant C. This constant is not huge, either: it's probably at most a few kilobytes.

Suppose we were talking about lengths of things rather than complexity. Suppose the best we could do when it comes to measuring lengths of things was to measure them up to some error of at most a millimeter - different methods gave slightly different answers. Then we could take a hard-assed attitude and say "length is not an intrinsic property", or we could take a more pragmatic attitude and say "we can't agree on lengths exactly, but we can agree on them up to an accuracy of a millimeter, and that's good enough for many purposes".

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bifouba, 8 months ago

@johncarlosbaez @skarthik I think we're talking slightly at cross-purposes. Your metaphor applies if we are talking about the irreducible complexity of random strings. Then yes, it's meaningful in the sense you explain to say that a random string of length 100 is intrinsically more complex than a random string of length 10. But random strings tend not to make good computer programs. So if we are talking about the intrinsic complexity of models, i.e., strings that represent turing machines to my UTM, that intuition doesn't carry over as smoothly. If we are engaged in competing modeling exercises and we each claim to be applying Occam's razor, but have different notions of what 'simple' means, Kolmogorov complexity is not going to resolved that disagreement for us, but at the same time, we both enjoy the same asymptotic benefits.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jonny, 8 months ago

@skarthik
Not VC dimension and related measures?
https://en.m.wikipedia.org/wiki/Vapnik%E2%80%93Chervonenkis_dimension

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

m8ta, 8 months ago

@jonny @skarthik

imho: Scientific models are a bit different than ML models -- they are closer to generator programs than to classifiers or function approximators.

If theories are closer to programs, they are better defined by Kolmogorov complexity; classifiers and function approximators are better defined (as mentioned) by VC dimension or Rademacher complexity...

Theories also call each other recursively, like Maxwell's equations; another reason to consider them 'programs'.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

skarthik, 8 months ago

@jonny

Sure... VC, Rademacher, whatever floats your boat.

I give you a physics model, you give me its complexity. So far, I don't see anything like that meaningfully possible.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment