A curious math problem I came up with: given a target, what's the fewest digits... - Mathematics

oblomov, 26 days ago (edited 26 days ago)

A curious math problem I came up with: given a target, what's the fewest digits an integer must have (in a given base) to contain all integers from 0 to the target, as substrings?

http://wok.oblomov.eu/mathesis/number-substrings/

@mathematics @math @math

e.g. for a target of 19 a candidate representative would be 1011213141516171819 in base 10, that has 19 digits. Can it be done in less, or is $\sigma_10(19) = 19$?
Can we find a general rule? Any properties of this function?

#math #maths #numberTheory #combinatorics

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Image

Image alternative text

mrdk, 26 days ago

@oblomov @mathematics @math @math No solution, but the problem is related to de Bruijn sequences (https://en.wikipedia.org/wiki/De_Bruijn_sequence), for which there exists a lot of literature.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

oblomov, 26 days ago

@mrdk @mathematics @math @math
oh, interesting. It's definitely related, although we allow different substrings to start at the same place, and this has a huge impact on the lengths (also it's not cyclic in our case, but that probably makes things worse).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

oblomov, 26 days ago

@mrdk @mathematics @math @math also this might explain why @mau saw some relation to Gray codes in the binary case.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mau, 25 days ago

@oblomov well, a Gray code codes all n-bit sequences from 000...0 to 111...1. It's a bit overkill (we don't need the sequence with all 0) but probably the overhead is just 1.

Cc: @mrdk @mathematics @math @math

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

wqferr, 26 days ago

@oblomov Sadly I can't respond with anything of value, though this looks a lot like a generalized superpermutation problem. Maybe that's a start to more research?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

oblomov, 26 days ago

@wqferr Oh excellent point. Added the combinatorics tag to the original post, might help get more relevant eyes on it.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

wqferr, 26 days ago

@oblomov On further thought... Does this simplify to a superpermutation problem in base 2?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

oblomov, 26 days ago

@wqferr I don't think it's exactly superpermutations we're looking at, even just in base 2, because there are going to be repetitions. For example, for a target of 7 in base 2 you need to have 0, 1, 10, 11, 100, 101, 110, 111. It's definitely related, but there's something “more” (or different) about it.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

wqferr, 26 days ago

@oblomov The main difference seems to be we want to find all the ends of substrings, not all substrings. If you add leading 0s to the numbers you listed in base 2, I believe we do get back to a superperm.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

oblomov, 26 days ago

@wqferr that's an interesting approach. It would at least help find some upper bounds on sigma_2.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

TootSweet, 26 days ago

I’ve mulled a similar problem.

Inspired by the following Fox Trot comic:

Fox Trot comic in which Jason runs a website containing downloadable files what’s claimed to be randomly-generated data that just happens by pure coincidence to be identical to various copyrighted works such as a script to an upcoming Star Wars movie and the full retail version of Windows 98

Some irrational numbers have the property that they contain every finite substring. (For instance, the full retail version of Windows 98 exists somewhere in the binary expansion of pi.) And they can’t really outlaw the number pi. (Or can you?)

So, theoretically if you could find the offset in the binary expansion of pi where the digits happen to be identical to the entirety of some particular copyrighted content, you could share that offset and the length of the original copyrighted work file and a recipient could reproduce the copyrighted content losslessly based just on that offset.

…theoretically.

One of the benefits of this approach is that it’d probably stump the courts at least for a time. If you got a copy of the movie Oppenheimer this way, could you be said to have “copied” it in a way that runs afoul of copyright law in whatever jurisdiction you happen to be in?

It is possible to generate “digits” of pi given an offset without claculating all the digits before, so there’s no real problem there. Some of the big issues with this approach would be:

The offset itself would be a huge number. It would in the vast majority of cases require a file much larger than the copyrighted work just to hold that offset number. So it would be very space inefficient.

If you wanted to share a file this way, I doubt there’s any way to search through pi efficiently for where that particular sequence of bits happens to start. So even if the size of the offset wasn’t an issue, it would probably be infeasible to find the offset.

So, maybe pi isn’t the best number. What if we could come up with a better number?

At this point, it’s probably useful to stop talking about binary expansions of irrational numbers and just talk about algorithms that generate infinite strings of bits with particular properties.

So, is it possible to create any algorithm that generates an infinite string of binary digits with the following properties?:

Its infinite expansion contains every finite substring.

Later bits in the output can be calculated without calculating all previous bits.

It’s efficiently packed. That is, we want to ensure that all strings of bits we might want to find in the output have as small/low offsets as possible. Preferably offsets that can be represented using a number of bits comparable to the length of the string of digits. This is the property where it becomes related to OP’s post.

Where the string of bits of the copyrighted work (as well as significantly-sized substrings of the copyrighted work) isn’t inherently in the representation of the offset.

You could even tweak some things. You could maybe tune it for strings between 1MB and 1TB in size, for instance. Maybe rather than dealing in bits, you could deal in blocks of 1MB. (Basically, rather than working in base 2, work in base 2^(2^20). Maybe you could get some efficiency gains that way.) Maybe find ways to prioritize bit strings that are more like the kind of data people would want to share this way. (Maybe make one version of the algorithm specifically for ascii text. Others for text in various different languages. Several for video data. Others for video games. Etc. And these would be optimized such that they gave you smaller offsets for the particular kind of data you might want to share via them.)

So, is this possible? I don’t know for sure. If I had to wager a guess, I’d bet not. But it’s interesting to think about.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

oblomov, 26 days ago

@TootSweet this reminds me of https://github.com/philipl/pifs, the filesystem based on the normality of π

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment