armchair_progamer

@armchair_progamer@programming.dev

This profile is from a federated server and may be incomplete. Browse more on the original instance.

Agda Core: The Dream and the Reality (jesper.cx)

This blog post is about Agda Core, a project to build a a formally specified core language for (a reasonable subset of) Agda together with a reference implementation of a type checker for it. If you have been following my work, you might know that I have been working on this project on and off for the past few years, yet the...

The search for easier safe systems programming (blog post + language) (www.sophiajt.com)

For the last year and a half, I and my recently-added collaborator Jane Losare-Lusby have been working in secret on a safe systems language that could be learned about as quickly as one can learn Go. I think we might have something worth exploring....

Notes on Implementing Algebraic Subtyping (blog post + language) (semantic.org)

Algebraic Subtyping is a type system devised by Stephen Dolan in 2016, in his dissertation. It extends Hindley-Milner with subtyping, in a way that preserves decidability and principality. Over the past few years, I have implemented Algebraic Subtyping in my language Pinafore (omitting record types). Pinafore is, as far as I...

Befreak (esolang) (tunes.org)

Befreak is a purely reversible two-dimensional programming language. It was inspired by the Chris Pressey’s Befunge programming language. Like Befunge, all Befreak instructions are written as a single character, and execution can flow north, south, east, and west....

Beatrice: A finally tagless, dependently typed, homoiconic programming language (hirrolot.github.io)

Today I am pleased to announce Beatrice, which is a finally tagless, dependently typed, self-aware functional programming language that I have been working on for quite a while. In this short blog post, I will demonstrate its most prominent features and contrast them to those of mainstream programming languages....

Dataflow Analyses and Compiler Optimizations that Use Them, for Free (blog.regehr.org)

One of my research group’s major goals is to create technologies that enable self-improving compilers. Taking humans out of the compiler-improvement loop will make this process orders of magnitude faster, and also the resulting compilers will tend to be correct by construction. One such technology is superoptimization, where...

Coroutines and effects (article) (without.boats)

For the past few months I’ve been mulling over some things that Russell Johnston made me realize about the relationship between effect systems and coroutines. You can read more of his thoughts on this subject here, but he made me realize that effect systems (like that found in Koka) and coroutines (like Rust’s async...

Narya: A proof assistant for higher-dimensional type theory (GitHub) (github.com)

Narya is eventually intended to be a proof assistant implementing Multi-Modal, Multi-Directional, Higher/Parametric/Displayed Observational Type Theory, but a formal type theory combining all those adjectives has not yet been specified. At the moment, Narya implements a normalization-by-evaluation algorithm and typechecker for...

Writing an EBNF -> Scheme parser generator using AI for machine-unfriendly descriptions, and further enhancements?

Hey. I have made some DSLs before, and for all of them, most of which were in C, I have used Flex and Bison. But this time I wanna use Scheme, Cyclone Scheme to be exact. I can potentially use Flex/Bison this time too, because Cyclone has a strong FFI to C....

armchair_progamer, (edited )

I find writing the parser by hand (recursive descent) to be easiest. Sometimes I use a lexer generator, or if there isn’t a good one (idk for Scheme), write the lexer by hand as well. Define a few helper functions and macros to remove most of the boilerplate (you really benefit from Scheme here), and you almost end up writing the rules out directly.

Yes, you need to manually implement choice and figure out what/when to lookahead. Yes, the final parser won’t be as readable as a BNF specification. But I find writing a hand-rolled parser generator for most grammars, even with the manual choice and lookahead, is surprisingly easy and fast.

The problem with parser generators is that, when they work they work well, but when they don’t work (fail to generate, the generated parser tries to parse the wrong node, the generated parser is very inefficient) it can be really hard to figure out why. A hand-rolled parser is much easier to debug, so when your grammar inevitably has problems, it ends up taking less time in total to go from spec to working hand-rolled vs. spec to working parser-generator-generated.

The hand-rolled rules may look something like (with user-defined macros and functions define-parse, parse, peek, next, and some simple rules like con-id and name-id as individual tokens):


<span style="color:#323232;">; pattern			::= [ con-id ] factor "begin" expr-list "end"
</span><span style="color:#323232;">(define-parse pattern
</span><span style="color:#323232;">  (mk-pattern
</span><span style="color:#323232;">    (if (con-id? (peek)) (next))
</span><span style="color:#323232;">    (parse factor)
</span><span style="color:#323232;">    (do (parse “begin”) (parse expr-list) (parse “end”))))
</span><span style="color:#323232;">
</span><span style="color:#323232;">; factor 			::= name-id 
</span><span style="color:#323232;">; 			 | symbol-literal
</span><span style="color:#323232;">; 			 | long-name-id 
</span><span style="color:#323232;">; 			 | numeric-literal 
</span><span style="color:#323232;">;	 		 | text-literal 
</span><span style="color:#323232;">; 			 | list-literal 
</span><span style="color:#323232;">; 			 | function-lambda 
</span><span style="color:#323232;">; 			 | tacit-arg
</span><span style="color:#323232;">; 			 | '(' expr ')' 
</span><span style="color:#323232;">(define-parse factor
</span><span style="color:#323232;">  (case (peek)
</span><span style="color:#323232;">    [name-id? (if (= “.” (peek2)) (mk-long-name-id …) (next))]
</span><span style="color:#323232;">    [literal? (next)]
</span><span style="color:#323232;">    …))
</span>

Since you’re using Scheme, you can almost certainly optimize the above to reduce even more boilerplate.

Regarding LLMs: if you start to write the parser with the EBNF comments above each rule like above, you can paste the EBNF in and Copilot will generate rules for you. Alternatively, you can feed a couple EBNF/code examples to ChatGPT and ask it to generate more.

In both cases the AI will probably make mistakes on tricky cases, but that’s practically inevitable. An LLM implemented in an error-proof code synthesis engine would be a breakthrough; and while there are techniques like fine-tuning, I suspect they wouldn’t improve the accuracy much, and certainly would amount to more effort in total (in fact most LLM “applications” are just a custom prompt on plain ChatGPT or another baseline model).

  • All
  • Subscribed
  • Moderated
  • Favorites
  • megavids
  • InstantRegret
  • magazineikmin
  • thenastyranch
  • Youngstown
  • mdbf
  • everett
  • slotface
  • khanakhh
  • ethstaker
  • rosin
  • kavyap
  • GTA5RPClips
  • tacticalgear
  • JUstTest
  • DreamBathrooms
  • ngwrru68w68
  • Durango
  • normalnudes
  • cubers
  • modclub
  • cisconetworking
  • tester
  • osvaldo12
  • provamag3
  • anitta
  • Leos
  • lostlight
  • All magazines