@pervognsen@mastodon.social avatar

pervognsen

@pervognsen@mastodon.social

Performance, compilers, hardware, mathematics, computer science.

I've worked in or adjacent to the video game industry for most of my career.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

Donzanoid, to random
@Donzanoid@mastodon.social avatar

Are there any state of the art references for fast lexing/parsing on modern machines with SSDs and lots of RAM?

I'm interested in tight inner loops, good BTB use, no data stalls and minimal IO bottlenecks. Is the STB doc still the best reference?

@pervognsen any ideas?

pervognsen,
@pervognsen@mastodon.social avatar

@Donzanoid If you mean Sean's article on lexing with DFA tables, etc, it's definitely not the fastest way to do things in my experience but you have to look at the whole pipeline holistically. I.e. the goal is rarely to just make lexing fast, it's usually lexing plus parsing (usually also plus things after parsing).

pervognsen,
@pervognsen@mastodon.social avatar

@Donzanoid Right, that changes the conversation completely and opens up a lot of fun options. The simdjson bag of tricks is eminently suited for this kind of thing. For "skip lexing" you can often treat a lot of lexical tokens interchangeably so you only need special states for the harder cases like comments or quoted strings. And at that point you might be in the state space size range where something like https://gist.github.com/pervognsen/218ea17743e1442e59bb60d29b1aa725 can be used for the skipping.

pervognsen,
@pervognsen@mastodon.social avatar

@Donzanoid Or something more bespoke like what simdjson does, e.g. it has clever tricks for dealing with specific problem cases like quoted string literals. More generally, a good design pattern here is to handle the common cases on the fast path and try to shunt rare (in a frequency sense) cases to an escape branch, so you can streamline the fast path while still having the freedom to write "normal code" on the slow (not actually slow, just slower) path.

pervognsen,
@pervognsen@mastodon.social avatar

@Donzanoid It's also worth starting with something even simpler and more traditional. Basically, the classic while (isspace(*s)) s++ kind of skip loop but with a table for the byte classification; also easy to simdify. This is basically going to cost one branch mispredict per "interesting event" if they're sufficiently sparse. If your average source line length is 50 bytes and you only have two "interesting events" per line on average, the mispredict overhead is in the range of <1 cycle/byte.

pervognsen,
@pervognsen@mastodon.social avatar

@Donzanoid I know that's obvious stuff but it's worth considering how far something simple will get you, if only to establish a baseline so you can get an idea of how much of a return on investment there could be for something more complex.

pervognsen,
@pervognsen@mastodon.social avatar

@Donzanoid Oh, another side note on why you might want to simdify that kind of short skip loop. Once you have it simdified plus maybe unrolled it a bit you've essentially clumped/quantized the loop tripcount so that the branch predictor will be much happier. This is a pretty standard simd trick for text processing and very low effort.

pervognsen,
@pervognsen@mastodon.social avatar

@Donzanoid @dominikg I consider the threshold for a fast "normal" lexer to be around 0.5 GB/s, so if you want multiple orders of magnitude beyond that, you might be a bit too aggressive. I think 5 GB/s (1 byte/cycle) is a more realistic target for a skip lexer depending on the specifics of the problem. At that point you're also pretty close to the DRAM bandwidth limit for a symmetric all-core workload, e.g. my laptop has 64 GB/s DRAM bandwidth, with 8 cores that's 8 GB/s/core.

pervognsen, to random
@pervognsen@mastodon.social avatar
pervognsen, to random
@pervognsen@mastodon.social avatar

Downside of Denmark: I'd forgotten all ISPs block scihub, libgen.is, etc. Stuff like this is just a subsidy for all the scammy VPN providers.

pervognsen,
@pervognsen@mastodon.social avatar

@Sharlock93 I mean, it's as much of a copyright violation as any conventional piracy site when it comes to legalities. It's just disappointing for something that provides such an obvious public service.

pervognsen,
@pervognsen@mastodon.social avatar

(Cloudflare's free WARP/1.1.1.1 service is good depending on how you feel about Cloudflare as a company but it still gives you a domestic IP so it doesn't help to bypass geoblocking.)

pervognsen,
@pervognsen@mastodon.social avatar

@el0j Nope, they don't seem to be. I tried using Google and Cloudflare's DNS servers (and I always use HTTPS).

pervognsen, (edited )
@pervognsen@mastodon.social avatar

@el0j Nevermind, I just realized I switched browsers recently and it looks like it didn't have DoH enabled (which is weird because I could have sworn both Chrome and Firefox have defaulted to DoH for years now and Vivaldi is Chromium based). It seems to work now. Thanks for the prod!

pervognsen,
@pervognsen@mastodon.social avatar

@madmoose Yeah, I found out my browser had been misconfigured (started using a new one recently) and wasn't using my normal DNS settings for some reason. Everything seems fine now.

pervognsen, to random
@pervognsen@mastodon.social avatar

This looks cool. Basically a polyglot version of go fmt's rewrite rules. https://ast-grep.github.io

pervognsen, (edited )
@pervognsen@mastodon.social avatar

Although I will admit that my past attempts at using purely syntax-based methods for tasks like this always leave me wanting to use types and other forms of (often language-specific) semantic analysis. At least it's a step up from purely text-based methods. All you're really doing is reducing the false positive/negative rate as you step up the ladder; that's also true even with static analysis if we're talking about linting. So syntax-based methods can be a sweet spot for some tasks.

pervognsen, (edited )
@pervognsen@mastodon.social avatar

And even purely text-based methods can be used to "cast a wide net" for more sophisticated methods of analysis. There's a blog post by matklad where he talks about how the JetBrains code analysis (and rust-analyzer) does this for "find usages" and similar kinds of reverse lookups. Instead of maintaining an up-to-date reverse index, you can do a text search for candidate matches of a semantic entity and then use semantic analysis to disqualify false positives.

https://rust-analyzer.github.io/blog/2019/11/13/find-usages.html

pervognsen,
@pervognsen@mastodon.social avatar

Some obvious limitations of this approach: common identifiers produce way more false positives in the initial grep stage. A saving grace is that a real implementation of this idea can apply some basic scope and reachability filtering so you're not just blindly grepping everything in the known universe. Lexical scopes can be locally analyzed. And it's easy for most sane languages to compute a reasonably precise but still conservative (hence safe) approximation of source file reachability.

unormal, to random
@unormal@mastodon.social avatar

Ho ly shit damnnnn

pervognsen,
@pervognsen@mastodon.social avatar
pervognsen, to random
@pervognsen@mastodon.social avatar

Somehow I doubt it. Headline: "Microsoft is finally getting its native Windows UI platform act together with WinUI 3 and WPF."

pervognsen,
@pervognsen@mastodon.social avatar

@suushikijitsu @rovarma Classic. And here I thought I thought they were just pulling the old Windows NT 3 trick where 3 is the first version.

pervognsen, to random
@pervognsen@mastodon.social avatar

I've never fully worked out how best to articulate my dissatisfaction with the usual way people talk about pluggable allocators in systems programming. Sure, I'd like to have some standard for fallible, pluggable allocation at the lower level of a language's standard library. But the entire mindset of plugging together allocators and data structures is something I find dubious and at best it feels like a poor compromise.

pervognsen,
@pervognsen@mastodon.social avatar

@zeux @artificialmind Yeah, this paper does something like this, it's basically PMAs but with coalesced gaps and it also stores the counts so you don't have to scan the gaps. https://ir.cwi.nl/pub/28649/28649.pdf

  • All
  • Subscribed
  • Moderated
  • Favorites
  • โ€ข
  • megavids
  • kavyap
  • DreamBathrooms
  • thenastyranch
  • magazineikmin
  • InstantRegret
  • GTA5RPClips
  • Youngstown
  • everett
  • slotface
  • rosin
  • osvaldo12
  • mdbf
  • ngwrru68w68
  • JUstTest
  • cubers
  • modclub
  • normalnudes
  • tester
  • khanakhh
  • Durango
  • ethstaker
  • tacticalgear
  • Leos
  • provamag3
  • anitta
  • cisconetworking
  • lostlight
  • All magazines