samebchase,
@samebchase@fantastic.earth avatar

I want to try out https://www.morling.dev/blog/one-billion-row-challenge/ in

The baseline solution in Java clocks in just under 5 seconds, so with Raku what should be a decent timing for a closest translation of that. The optimized solutions are less than 2 seconds.

What is a good time to target?

samebchase,
@samebchase@fantastic.earth avatar

Dear rakoons, what's the best way to memory map a file?

Do I need to write a C program and use NativeCall or whatever?

Even doing the equivalent of wc -l in pure Raku on a 12 G file takes far too long.

The fast solutions for the 1brc challenge, all seem to be memory mapping the file.

lizmat,
@lizmat@mastodon.social avatar

@samebchase

Let the OS do the memmapping: it generally knows best nowadays.

The delay is not in reading from disk, but the conversion from raw bytes to Raku internal strings.

Since you're looking only for bytes between ";' and "\n", perhaps reading as blobs could speed things up.

samebchase,
@samebchase@fantastic.earth avatar

@lizmat On my machine, when I do:

for $path.IO.lines() {
$i++;

if $i %% 1_000_000 {
say "Processed $i rows."
}
}

which is essentially a pure Raku version of wc -l, it takes 6 seconds for 10M rows.

For 1G rows, that's ~600 seconds and for 12G rows, that's 7200 seconds.

This represents a lower bound as to what processing the file line by line can be. We are not doing any of the string.split or any of the other bookkeeping in the hashmap.

I was wondering if I could try an alternative approach...

lizmat,
@lizmat@mastodon.social avatar

@samebchase

my atomicint $i;
$path.IO.lines.race(batch => 10000).map({ ++⚛$i });

would be a better benchmark. This will at least parallelize the work (incrementing $i). But that still needs serial decoding.

How fast would this be:

my $h = open $path;
Nil while $h.read;

that would be the lower bound.

You could .list.grep(10, :k) on each Blob, and use that to decode async (10 being "\n".ord).

Is that a plan?

samebchase,
@samebchase@fantastic.earth avatar

@lizmat Let me try this out.

For comparision with single threaded wc -l i mplementation:

  1. Go takes 23 seconds.
  2. Common Lisp (SBCL) takes 70 seconds.

CL => 3x Go
Raku => 10x CL

The Nil while read thing that you've given runs in 20 seconds.

The race and atomic increment thing, I killed it after 3 mins.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • raku
  • DreamBathrooms
  • magazineikmin
  • thenastyranch
  • modclub
  • everett
  • rosin
  • Youngstown
  • slotface
  • ethstaker
  • mdbf
  • kavyap
  • osvaldo12
  • InstantRegret
  • Durango
  • megavids
  • ngwrru68w68
  • tester
  • khanakhh
  • love
  • tacticalgear
  • cubers
  • GTA5RPClips
  • Leos
  • normalnudes
  • provamag3
  • cisconetworking
  • anitta
  • JUstTest
  • All magazines