Is there a better algorithm for converting big binaries into decimal?

At the moment, I am stuck with using single-precision float, and double-precision float. So, the maximum represent-able value for single is 1<<24 while for double, it is 1<<53.

Because of this, I made the following script here - gist.github.com/…/71e3eec41e44e2e3d896a10f2a51448… .

Allow me to clarify on the script above. On the first part, rep_bin2dec does is to return the converted values into the status. So, when I do ${} or variable=${rep_bin2dec\ ???}, I get the status string.

On the second part, rep_bin2dec_base is the basis for getting rep_bin2dec to work. _rep_bin2dec_base prints the base_10M array into a string.

So, how does rep_bin2dec_base converts a big binary into big decimal?

If the binary image is less than dimension of 54, then the script will use 0b{} which allows me to directly convert binary to decimal, and 0b is a binary literal much in the same way that Python and C++ does it. From this point, it’s pretty obvious on what to do there. So, if it less than dimension 54, this step 1 is pretty much done. If not, move on to step 2.
Convert the binary image as a image of base (1<<24) representing the value of that image. Note that there are two channels “[ output_value , y ]”. y in this case represents the digit position in base (1<<24).
Make the converted image as a dynamic array image. This allows us to remove unused digits. You can look at step 2, and step 3 as converting a binary string into an array of base (1<<24) into a dynamic array. Also, note that start_value is stored. That’s the very first digit.
Note that the number_of_decimals is the predicted number of characters after conversion of binary to decimal. And the, there’s multi-threading that gets activated depending on the size of dynamic array image. decimal_conversion_array_size,result_conversion_array_size is used to define array size as they’re temporary arrays to convert from base (1<<24) into base 10M. Finally, there’s a new image which is going to be using base 10 million for easy printing, and set is used to add the first digit of base (1<<24) which will then be converted to base 10M.
On eval[-2], we are now processing the base (1<<24) image, and then convert it into base 10M. There’s a implicit loop, so you can add a “for y” after begin(), and begin() can be seen as the setup code.

Some notes, copy() basically allows me to alter an array. In this case, opacity is negative, so it will add the multiplication of the positive opacity. If opacity was between 0-1, then it will get treated similar to how opacity of one layer alters a image. And the multiplication algorithm being used to convert between bases is Schönhage-Strassen multiplication, but without the FFT part.

So, here how that works.


<span style="color:#323232;">   9   9
</span><span style="color:#323232;">x  1   9
</span><span style="color:#323232;">_________
</span><span style="color:#323232;">  81  81
</span><span style="color:#323232;">9  9
</span><span style="color:#323232;">_________
</span><span style="color:#323232;">1  8  8 1
</span>

Basically, it’s long multiplication, and you can see that there’s carrying of the remainder. 81 -> 1 (Remainder 8). 81 + 9 + R8 = 89 + 9 = 8 R ( 1+ 8 ) = 8 R 9. Then 9 + 9 is 18. So, you can see how this results in 1881.

After the conversion to base 10M, depending on your inputs, it’ll set the status value to the decimal representation or preserves it as a base 10M for easy printing with _rep_bin2dec_base after alteration.

There’s some more details, but I find it really hard to explain this.

So, my question is what are some good algorithm to print out huge binaries as decimal? I know Python is insanely good at that, but I can’t seem to understand how it does that so well. I know that they do involve conversion to base 2^30 or 1<<30.

At the moment, I can convert a 90000 digits binary in .35 s, and that’s bad to what I seen in Python. It’s really bad with 1M binary digits.

palordrolap, 2 months ago

This could be an XY problem, that is, you're trying to solve problem X, rather than the underlying problem Y. Y here being: Why do you need things to be in decimal in the first place?

Arithmetic can be performed on values stored in binary without going through a decimal stage at any point.

Large integer types can be absolutely anything under the hood and only displayed in decimal when a human needs to read it.

Now, if human readability is a necessity, that underlying representation might well be decimal-oriented for simplicity. e.g. using arrays of 32-bit integers but forcing a carry when they reach 10^9. That way barely any conversion is needed when they're printed out. Multiplications could use 64-bit integers as intermediaries.

Where that's not available, something less than or equal to 32-bit integers up to 10^4 - 1 and 32-bit integers for multiplication.

Or, if the 52 bits in a 64-bit floating point value are a more tempting target, a restriction to 10^7 - 1 as the max digit allows for use of the same data type for multiplication intermediaries.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Reptorian, 2 months ago

This could be an XY problem, that is, you’re trying to solve problem X, rather than the underlying problem Y. Y here being: Why do you need things to be in decimal in the first place?

I wouldn’t say it’s needed, but this is more of a fun thing for me. The only thing I’m using this is for Tupper’s Self-Referential formula, and my current approach of converting base 1>>24 to base 1e7 works instantly for 106x17 binary digits. When I load a image to that filter that’s greater than somewhere over 256x256, delays are noticeable because the underlying algorithm isn’t that great, but it could have to do with the fact that G’MIC is interpretative, and despite the JIT support in it, this is not the kind of problem it’s meant to solve (Domain-Specific). On the bright side of thing, this algorithm will work with any data type as long as one data type is one level higher than the other, and in this case, I’m using the lowest level (single and double), and the bigger data type, much faster it can be.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Ephera, 2 months ago

I don’t know about better algorithms, but you should mind that Python’s radix conversion code is probably implemented in C. That can make a big difference for this kind of number crunching.

I don’t understand your problem well enough to know, if you can (or want to) use this here, but you might be able to tap into that C performance with the radix conversion formatting of printf.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Reptorian, 2 months ago

I don’t understand your problem well enough to know, if you can (or want to) use this here, but you might be able to tap into that C performance with the radix conversion formatting of printf.

The problem is printing big binary to decimal. That’s not a easy problem because 10 is not a power 2. If we live in a base-hex world, this would be very easy to solve in O(n).

Also, I can’t access that as G’MIC is a language that can’t really communicate with other language as it’s not meant to share memory.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment