Ok, this one is for the discrete #math gurus out there.... - Mathematics

azonenberg, 13 days ago

Ok, this one is for the discrete #math gurus out there.

Let N = CRC32(X)

Given N, is it possible to efficiently calculate CRC32(concat(X, Y)) where Y is a known sized, but very long, sequence of 0xFF bytes?

Obviously you can just seed the CRC with N and iterate, feeding 0xFF in each cycle, but is there any kind of shortcut you can take if you know the input is always a 1 bit?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Image

Image alternative text

destevez, 12 days ago

@azonenberg nice question! My intuition tells me that there are no major shortcuts. Thinking of the CRC hardware implementation, you initialize the register with N and keep pushing many 1's. This is very similar to an LFSR and also to multiplication on a cyclic group. The intuition is that the register states evolve in a random-like way and it is quite difficult to predict how the register will look like after many steps without doing the calculation for those many steps.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

destevez, 12 days ago

@azonenberg this is not a proof whatsoever, and maybe there's something I'm missing and there is a good shortcut. I'll try to think about this in more formal terms to see if there is something that can be proved.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 12 days ago

@destevez See other replies, it's definitely doable

https://ioc.exchange/@jix@mathstodon.xyz/112398438452087900

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

destevez, 12 days ago

@azonenberg I now see where my intuition failed. The problem is pretty much the same as the following: given an element g in a finite group and a positive integer n, compute g^n (see that answer about matrix multiplication). This can always be done if O(log n) multiplications, even without precomputing anything. Just compute g^2, g^4 = g^2*g^2, ..., g^2^(floor(log_2 n)), and then multiply those powers corresponding to the 1's in the binary expansion of n.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

destevez, 12 days ago

@azonenberg what is hard is solving the discrete logarithm problem (some cryptography is based on this). This is, given g and h, find n such that h = g^n. For this, the best algorithm known is brute force, which is computationally prohibitive. In your case, this would be asking how many 0xff bytes you need to get a particular given CRC at the end.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 12 days ago

@destevez Yep. This is discrete exponentiation, far easier :)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

8051enthusiast, 13 days ago

@azonenberg i know you already have like 2 solutions, but i got nerd-sniped by this and wanted to play around with normal bases: https://gist.github.com/8051Enthusiast/80c044b3a6f6ad145617eebe11eb9fb1

it only uses bit shifts/rotates/xor and applies constant bit matrices to an int, so except for the 384 bytes storage requirements for the constant matrices, it should be easy to implement
(the matrices are specific to the zlib crc32)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 13 days ago

@8051enthusiast If you make one with matrices for the Ethernet polynomial I'll give it a try.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

philpem, 13 days ago

@azonenberg In theory at least, a CRC is a linear-feedback shift register. After a certain number of static bits it's going to start repeating itself. My first answer would be a table lookup indexed by [CRC32(X)] and [N] (where N = number of FF bytes).
The problem is that's going to be a big table. It'll probably also have repetitions.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

attie, 13 days ago

@azonenberg Would it not be far preferable to somehow capture and encode the length of X inside itself?

For example, place a uint32 right at the start, or at some predefined place close to the start (e.g: if you have startup code / interrupt vectors at the start)...

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 13 days ago

@attie It gets tricky, I've been thinking of how I'd do it but given the use cases of how the system is being used I'd rather not overcomplicate if I can.

Checksumming all of flash isn't THAT hard.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

whitequark, 13 days ago

@azonenberg cursed solution: synthesize an unrolled combinational CRC32 calculator and then grab the function it optimizes to

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 13 days ago

@whitequark I forgot to specify, the length of Y is known but not constant.

So I might need CRC32(X, 253K of 0xff) one minute and CRC32(X, 252K of 0xFF) the next.

Use case is calculating the expected CRC of an entire flash partition given only the data I'm writing into it and the a priori knowledge that it's blank.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jix, 13 days ago

@azonenberg @whitequark CRC32-UPDATE(N, 0xff) is an affine function in the input state N when you view N as a 32 element vector over GF(2), i.e. you can represent it as a 32x32 bit matrix for the linear part and a 32 bit vector for the constant offset. Like is common in 3d graphics we can use homogeneous coordinates to represent both together in a 33x33 matrix so we can easily compose things including translations by using only matrix multiplication. You can then use exponentiation by squaring to quickly compute any power of that matrix to get a matrix representation of the state update function for adding any given number of 0xff bytes.

Here is some python code that does that: https://gist.github.com/jix/6fe33ef0900d41f550cd52b33ab88d55

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ philpem

jix, 13 days ago

@azonenberg also, if you squint hard enough this is what @whitequark suggested, but done for every power of 2 length (reusing the optimized function of the preceding power of two and exploiting some known properties of crcs to do the optimization) and then you pick the resulting functions for every bit that's set in the number of 0xff bytes you want to add and compose them

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sgf, 13 days ago

@jix @azonenberg Thank you for coming here and saying what I was going to say, but more mathematically.

+1 to treating the CRC step as a matrix multiplication and doing matrix exponentiation to fast forward in log time.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jix, 13 days ago

@azonenberg is this useful to you? I could also write a verilog or C implementation (with a python script to generate a few small tables I'd use for those) if that would help. I wouldn't mind doing that even if you don't end up using that after all as long as you think it'd be worth giving it a try.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 13 days ago

@jix A permissively licensed C implementation would be quite handy if you're willing to throw one together.

I'm using the standard zip/Ethernet CRC32 formulation.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jix, 12 days ago

@azonenberg Just pushed to https://github.com/jix/fast_fwd_crc

Has an example that can be run with make run that checks against zlib, and is 0BSD licensed.

Ended up using the polynomial quotient ring approach instead of the matrix approach as it's faster, doesn't need any tables, and, after figuring out the right bit order everywhere, simpler to implement.

The high level idea is still the same as for the matrix approach.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 12 days ago

@jix Thanks! Will check it out after work.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

breakin, 13 days ago

@azonenberg @whitequark Some googling gave me https://groups.google.com/g/comp.compression/c/SHyr5bp5rtc/m/PP5-pmv9-9sJ. The second answer from the top (by Dr Mark Adler) has a lot of good information in it that seems applicable!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ philpem

Add comment