I've built a #transpiler in #Rust, compiled it to #WASM and integrated it into a #Vue app! :awesome:
It's called selecuery.✨
It can transpile X++ select statements into query expressions. If you think "X++" is a typo and you don't have any idea of what I'm talking about, don't worry.😄
Have a look at the video below.
This project is dear to my heart! ❤️ I've started it 2019 for learning #RustLang.
I think, I've been transpiled during this project as well.🤪
I'm starting to wonder if there's any point in having the lexer and parser as two separate classes.
Other than testing, the lexer is only ever going to be called by the parser, and only once during the process.
It might be better to just have a lexer-parser class that grabs a file, tokenises it, then (if it's happy with the file it's tokenised) immediately turns it into a tree.
Is there a really good reason why they should be separate classes?
I have an unhealthy addiction to relatively obscure computers that I probably wouldn't actually use very much. Here is the latest one that the little voice in my head is telling me I need to buy so I can get my fix: the HiFive Pro P550 running the RISC-V ISA:
MicroATX form factor
4-core 2.2 GHz
16GB DDR5
Gigabit ethernet
PCIe expansion slot
NVMe
And it should be able to run Guix OS. The thing is, I don't really hack on operating systems or compilers very often, so I would only be using it as an ordinary end-user with the limited software available for it, which I can do right now, and with more available software, using any old x86_64 computer.
So logically, I don't actually need an awesome high-powered RISC-V development board for anything. But that doesn't stop me from seriously considering buying one.
I'm trying to work out where the line is between a lexer (tokeniser) and a parser.
How far should a lexer go before it's doing stuff that a parser should do? Should the lexer have some intelligence about what it's expecting to see next, or what needs to be ignored (e.g. comments)? Or should the lexer just make tokens and leave the rest to be left to the parser?
I'm not building a #compiler as such, but the principles are basically the same for a preprocessor.
Almost all the code generation is table driven. Inc and Dec are one of the exceptions that require code. In this case it's a loop to generate the INC or DEC instructions.
Only thing left to do is to generate add or subtract if the offset is too large. For now I'm stabbing at doing this for offset greater than four. Optimising here is much more complex than it might seem. For example you can INC any register whereas ADD requires A.
The feeling when you bang your head against the wall for 3 hours and then just try something, but don't really believe in it and suddenly all your unit tests pass! 🎉 :awesome:
This is the beauty of #TestDrivenDevelopment - you can just try and guess until it works.😄 It's such a funny experience!
TIL that Go doesn't have bytes.Equal([]byte,string) or strings.Equal(string,[]byte) because as of 2019 the compiler is smart enough to make string([]byte) into a cast rather than a copy (possibly with allocation) when used in these types of comparisons.
I wish there was some central documentation of non-obvious "magical" optimizations like this.
bytes, internal/bytealg: simplify Equal The compiler has advanced enough that it is cheaper to convert to strings than to go through the assembly trampolines to call runtime.memequal. Simplify Equal accordingly, and cull dead code from bytealg. While we're here, simplify Equal's documentation. Fixes #31587
#Quiche#compiler is now alive! (At least Conway's variant of alive). The initial version was slow - about four seconds per generation. It was multiplying coordinates for each cell read and write.
The second variant uses offsets into each liner buffer, and only redraws changed cells. It's now running at three to four generations per second.
Uh, ohhh... I think it's time for me to migrate away from #nom v4.2 😮
Yeah, I know, I've procrastinated on this a lot. This will probably be a lot of work and "slow me down" for a bit. 😪 On the upside, though: I can correct all my mistakes along the way (like having spans).
I'll probably migrate to #chumsky, but #winnow also looks really nice. 🙂
Before Christmas I decided the #Quiche#compiler needed two big refactorings. The first is nearly done: the data tables for operands and primitives.
The OG version had grown confusing due to some poor initial decisions. It also put too much intelligence into the parser regarding the available types for each operator.
The new version allows the parser to scan the table to confirm if an operator can handle the operator types. It can also 'expand' types to find a match...
Let's say I write a function like this, in C++17 or later:
inline int Calculate(int a, int b) {
return a+b;
}
I put it in a file called calculate.h and include (and use) it at multiple other places in the code.
Let's assume the function is not inlined at call sites. Due to the inline keyword, the compiler will ensure that Calculate() exists only once. (See https://en.cppreference.com/w/cpp/language/inline)
Question: Will the compiler generate the instructions multiple times, or does it avoid compiling a function body that's already going to be compiled in a different translation unit?
In other words: Do lots of inline functions in header files slow down compilation?
Some compiler routines such as sizeof() need to be able to handle a type name as a parameter, for example sizeof(Integer).
I've added a type called TypeDef to handle this. When the parser hits an identifier which is a type name but not a typecast it returns a value of type TypeDef.
This week I added the Peek() and Poke() intrinsics to the #Quiche#compiler. That means I can now write my first non-trivial program.
I spend this morning fixing a few bugs in the parser and code generator and it's successfully generating the assembler file.
The assembler is choking on a couple of issues with identifiers, and the output code has a couple of bugs to do with parameter parsing and result processing.
All of the operators are now passed over to the new data tables and primitive search. I'm moving onto intrinsics. These are small routines with function-like that often generate inline code, such as peek, inp and sizeof.
Many of these have quirks such as accepting multiple types, or a typedef. The quirk of Abs is unsigned values: in doesn't affect them. I could raise an error but it's nicer to fake the data to not generate any code.
In their blog post "Speeding up Rust edit-build-run cycle" David Lattimore shows how you can speed up #RustLang compile times by 16x by just changing some default compiler config:
Progress on the #compiler. The Z8 now passes the test suite and the build coverage test. The test suite is pretty basic so there are probably plenty of bugs left. Code density is not great on the Z8 though. Also added register keywords for arguments to the compiler and split I/D to the linker.
The bytecode output for the #1802 also works with a bytecode engine in C, but the 1802 part is a long way off. Might have to stop putting off debugging the 65C816 now and carry on with #6502
The compiler tutorials I've read don't talk about how to deal with classes and inheritance. I assume that a metaclass has to be built for each class. But should I then store those metaclasses for later use, or do I regenerate them when needed? I assume the former.
Also, my parser doesn't currently check for duplicate classes or methods (inside classes). Should it be in the parser, or should it be part of the thing that builds the output?