Vorpal

@Vorpal@programming.dev

This profile is from a federated server and may be incomplete. Browse more on the original instance.

Vorpal,

Due to the recent xz trouble I presume? Good idea, I was thinking about this on an ecosystem wise scale (e.g. all of crates.io or all of a Linux distro) which is a much harder problem to solve.

Not sure if the tag logic is needed though. I thought cargo embedded the commit ID in the published package?

Also I’m amazed that the name cargo-goggles was available.

Vorpal,

Yes, obviously there are more ways to hide malicious code.

As for the git commit ID, I didn’t see you using it even when it was available though? But perhaps that could be a weakness, if the commit ID used does not match the tag in the repo, that would be a red flag too. That could be worth checking.

Vorpal,

Hm, that is a fair point. Perhaps it would make sense to produce a table of checks: indicate which checks each dependency fails/passes, and then colour code them with severity.

Some experimentation on real world code is probably needed. I plan to try this tool on my own projects soon (after I manually verified that your crate match your git code (hah! Bootstrap problem), I already reviewed your code on github and it seemed to do what it claims).

Vorpal,

Please, send an email to lwn@lwn.net to report this issue to them, they usually fix things quickly.

Vorpal,

So there is a couple of options for plugins in Rust (and I haven’t tried any of them):

  • Wasm, supposedly extism.org makes this less painful.
  • libloading + C ABI
  • One of the two stable ABI crates (stabby or abi_stable) + libloading
  • If you want to build them into your code base but not have to update a central list there is linkme and inventory.
  • An embedded scripting language might also be a (very different) option. Something like mlua, rhai or rune.

I don’t know if any of these suit your needs, but at least you now have some things to investigate further.

Vorpal,

Yes, rust is that much of a pain in this case, since you can only safely pass plain C compatible types across the plugin boundary.

One reason is that rust doesn’t have stable layouts of structs and enums, the compiler is free to optimise the to avoid padding by reordering, decide which parts to use as niches for Options etc. And yes, that changes every now and then as the devs come up with new optimisations. I think it changes most recently last summer.

Vorpal,

Sure, but my point was that such a C ABI is a pain. There are some crates that help:

  • Rust-C++: cxx and autocxx
  • Rust-Rust: stabby or abi_stable

But without those and just plain bindgen it is a pain to transfer any types that can’t easily just be repr©, and there are quite a few such types. Enums with data for example. Or anything using the built in collections (HashMap, etc) or any other complex type you don’t have direct control over yourself.

So my point still stands. FFI with just bindgen/cbindgen is a pain, and lack of stable ABI means you need to use FFI between rust and rust (when loading dynamically).

In fact FFI is a pain in most languages (apart from C itself where it is business as usual… oh wait that is the same as pain, never mind) since you are limited to the lowest common denominator for types except in a few specific cases.

Vorpal, (edited )

Sounds interesting! As I don’t know restic that this is apparently based on, what are the differentiating factors between them? While I’m always on board for a rewrite in Rust in general, I’m curious as to if there is anything more to it than that.

EDIT: seems this is already answered in the FAQ, my bad.

Vorpal, (edited )

The term you are looking for in general is “reverse engineering”. For software in particular you are looking at disassembly, decompilation and various forms of tracing and debugging.

As for particular software: For .NET there is ILSpy that can help you look into how things work. For native code I have used Ghidra in the past.

Native code is a lot more effort to understand. In both cases things like variable names names will be gone. Most function names will be missing (even more so for native code). Type names too. For native code the types themselves will be gone, so you will have to look at what is going on and guess if something is a struct or an array. How big is the struct and what are the fields?

Left over debug or logging lines are very valuable in figuring out what something is. Often times you have to go over a piece of disassembly or decompiled code several times as your understanding of it gradually builds.

C++ code with lots of object orientation tends to be easier to figure out the big picture of than C code, as the classes and inheritance provides a more obvious pattern.

Then there is dynamic tracing (running under some sort of debugger or call tracer to see what the software does). I have not had as much success with this.

Note that I’m absolutely an amateur at reverse engineering. I thought it was interesting enough that I wanted to learn it (and I had a small project where it was useful). But I’m mostly a programmer.

I have done a lot of low level programming (C, C++, even a small amount of assembly, in recent times a lot of Rust), and this knowledge helps when reverse engineering. You need to understand how compilers and linkers lowers code to machine code in order to have a fighting chance at reversing that.

Also note that there may be legal complications when doing reverse engineering, especially with regards to how you make use of the things you learned. I’m not a lawyer, this is not legal advice, etc. But check out the legal guidelines of Asahi Linux (who are working on reverse engineering M1 macs to run Linux on them): asahilinux.org/copyright/ (scroll down to “reverse engineering policy”).

Now this covers (at a high level) how to figure things out. How you then patch closed source software I have no idea. Haven’t looked into that, as my interest was in figuring out how hardware and drivers worked to make open source software talk to said hardware.

Vorpal,

With native code I mean machine code. That is indeed usually produced by C or C++, though there are some other options too, notably Rust and Go both also compile to native machine code rather than some sort of byte code. In contrast Java, C# and Python all compile to various byte code representations (that are usually much higher level and thus easier to figure out).

You could of course also have hand written assembly code, but that is rare these days outside a few specific critical functions like memcpy or media encoders/decoders.

I basically learnt as I went, googling things I needed to figure out. I was goal oriented in this case: I wanted to figure out how some particular drivers worked on a particular laptop so I could implement the same thing on Linux. I had heard of and used ghidra briefly before (during a capture the flag security competition at univerisity). I didn’t really want to use it here though to ensure I could be fully in the clear legally. So I focused on tracing instead.

I did in fact write up what I found out. Be warned it is a bit on the vague side and mostly focuses on the results I found. I did plan a followup blog post with more details on the process as well as more things I figured out about the laptop, but never got around to it. In particular I did eventually figure out power monitoring and how to read the fan speed. Here is a link if you are interested to what I did write: vorpal.se/…/reverse-engineering-acpi-functionalit…

Vorpal,

I have read it, it is a very good book, and the memory ordering and atomics sections are also applicable to C and C++ since all of these languages use the same memory ordering model.

Can strongly recommend it if you want to do any low level concurrency (which I do in my C++ day job). I recommended it to my colleagues too whenever they had occasion to look at such code.

I do wish there was a bit more on more obscure and advanced patterns though. Things like RCU, seqlocks etc basically get an honorable mention in chapter 10.

Vorpal, (edited )

Yes, Sweden really screwed up the first attempt at switching to Gregorian calendar. But there were also multiple countries who switched back and forth a couple of times. Or Switzerland where each administrative region switched separately.

But I think we in Sweden still “win” for worst screw up. Also, there is no good way to handle these dates without specific reference to precise location and which calender they refer to (timestamps will be ambiguous when switching back to Julian calendar).

Vorpal,

My guess is that the relevant keyword for the choice of OpenSSL is FIPS. Rusttls doesn’t (or at least didn’t) have that certification, which matters if you are dealing with US government (directly or indirectly). I believe there is an alternative backend (instead of ring) these days that does have FIPS though.

Vorpal,

I can second this, I use aconfmgr and love it. Especially useful to manage multiple computers (desktop, laptop, old computer doing other things etc).

Though I’m currently planning to rewrite it since it doesn’t seem maintained any more, and I want a multi-distro solution (because I also want to use it on my Pis where I run Raspbians). The rewrite will be in Rust, and I’m currently deciding on what configuration language to use. I’m leaning towards rhai (because it seems easy to integrate from the rust side, and I’m not getting too angry at the language when reading the docs for it). Oh and one component for it is already written and published: github.com/VorpalBlade/paketkoll is a fast rust replacement for paccheck (that is used internally by aconfmgr to find files that differ).

Vorpal,

I would go with the Arch specific aur.archlinux.org/packages/aconfmgr-git instead of ansible, since it can save current system state as well. I use it and love it. See another reply on this post for a slightly deeper discussion on it.

Vorpal,

I have only implemented for checking all packages at the current point in time (as that is what I need later on). It could be possible to add support for checking a single package.

Thank you for reminding me of pacman -Qkk though, I had forgotten it existed.

I just did a test of pacman -Qk and pacman -Qkk (with no package, so checking all of them) and paketkoll is much faster. Based on the man page:

  • pacman -Qk only checks file exists. I don’t have that option, I always check file properties at least, but have the option to skip checking the file hash if the mtime and size matches (paketkoll --trust-mtime). Even though I check more in this scenario I’m still about 4x faster.
  • pacman -Qkk checks checksum as well (similar to plain paketkoll). It is unclear to me if pacman will check the checksum if the mtime and size matches.

I can report that paketkoll handily beats pacman in both scenarios (pacman -Qk is slower than paketkoll --trust-mtime, and pacman -Qkk is much slower than plain paketkoll). Below are the output of using the hyperfine benchmarking tool:


<span style="color:#323232;">$ hyperfine -i -N --warmup=1 "paketkoll --trust-mtime" "paketkoll" "pacman -Qk" "pacman -Qkk"
</span><span style="color:#323232;">Benchmark 1: paketkoll --trust-mtime
</span><span style="color:#323232;">  Time (mean ± σ):     246.4 ms ±   7.5 ms    [User: 1223.3 ms, System: 1247.7 ms]
</span><span style="color:#323232;">  Range (min … max):   238.2 ms … 261.7 ms    11 runs
</span><span style="color:#323232;"> 
</span><span style="color:#323232;">  Warning: Ignoring non-zero exit code.
</span><span style="color:#323232;"> 
</span><span style="color:#323232;">Benchmark 2: paketkoll
</span><span style="color:#323232;">  Time (mean ± σ):      5.312 s ±  0.387 s    [User: 17.321 s, System: 13.461 s]
</span><span style="color:#323232;">  Range (min … max):    4.907 s …  6.058 s    10 runs
</span><span style="color:#323232;"> 
</span><span style="color:#323232;">  Warning: Ignoring non-zero exit code.
</span><span style="color:#323232;"> 
</span><span style="color:#323232;">Benchmark 3: pacman -Qk
</span><span style="color:#323232;">  Time (mean ± σ):     976.7 ms ±   5.0 ms    [User: 101.9 ms, System: 873.5 ms]
</span><span style="color:#323232;">  Range (min … max):   970.3 ms … 984.6 ms    10 runs
</span><span style="color:#323232;"> 
</span><span style="color:#323232;">Benchmark 4: pacman -Qkk
</span><span style="color:#323232;">  Time (mean ± σ):     86.467 s ±  0.160 s    [User: 53.327 s, System: 16.404 s]
</span><span style="color:#323232;">  Range (min … max):   86.315 s … 86.819 s    10 runs
</span><span style="color:#323232;"> 
</span><span style="color:#323232;">  Warning: Ignoring non-zero exit code.
</span>

It appears that pacman -Qkk is much slower than paccheck --file-properties --sha256sum even. I don’t know how that is possible!

The above benchmarks were executed on an AMD Ryzen 5600X with 32 GB RAM and an Gen3 NVME SSD. pacman -Syu executed as of yesterday most recently. Disk cache was hot in between runs for all the tools, that would make the first run a bit slower for all the tools (but not to a large extent on a SSD, I can imagine it would dominate on a mechanical HDD though)

In conclusion:

  • When checking just file properties paketkoll is 3.96 times faster than pacman checking just if the files exist
  • When checking checksums paketkoll is 16.3 times faster than pacman checking file properties. This is impressive on a 6 core/12 thread CPU. pacman must be doing something exceedingly stupid here (might be worth looking into, perhaps it is checking both sha256sum and md5sum, which is totally unneeded). Compared to paccheck I see a 7x speedup in that scenario which is more in line with what I would expect.
Vorpal,

It very much is (as I even acknowledge at the end of the github README). 😀

Vorpal,

I went ahead and implemented support for filtering packages (just made a new release: v0.1.3).

I am of course still faster. Here are two examples that show a small package (where it doesn’t really matter that much) and a huge package (where it makes a massive difference). Excuse the strange paths, this is straight from the development tree.

Lets check on pacman itself, and lets include config files too (not sure if pacman has that option even?). Config files or not doesn’t make a measurable difference though:


<span style="color:#323232;">$ hyperfine -i -N --warmup 1 "./target/release/paketkoll --config-files=include pacman" "pacman -Qkk pacman"
</span><span style="color:#323232;">Benchmark 1: ./target/release/paketkoll --config-files=include pacman
</span><span style="color:#323232;">  Time (mean ± σ):      14.0 ms ±   0.2 ms    [User: 21.1 ms, System: 19.0 ms]
</span><span style="color:#323232;">  Range (min … max):    13.4 ms …  14.5 ms    216 runs
</span><span style="color:#323232;"> 
</span><span style="color:#323232;">  Warning: Ignoring non-zero exit code.
</span><span style="color:#323232;"> 
</span><span style="color:#323232;">Benchmark 2: pacman -Qkk pacman
</span><span style="color:#323232;">  Time (mean ± σ):      20.2 ms ±   0.2 ms    [User: 11.2 ms, System: 8.8 ms]
</span><span style="color:#323232;">  Range (min … max):    19.9 ms …  21.1 ms    147 runs
</span><span style="color:#323232;"> 
</span><span style="color:#323232;">Summary
</span><span style="color:#323232;">  ./target/release/paketkoll --config-files=include pacman ran
</span><span style="color:#323232;">    1.44 ± 0.02 times faster than pacman -Qkk pacman
</span>

Lets check on davici-resolve as well. Which is massive (5.89 GB):


<span style="color:#323232;">$ hyperfine -i -N --warmup 1 "./target/release/paketkoll --config-files=include pacman davinci-resolve" "pacman -Qkk pacman davinci-resolve"
</span><span style="color:#323232;">Benchmark 1: ./target/release/paketkoll --config-files=include pacman davinci-resolve
</span><span style="color:#323232;">  Time (mean ± σ):     770.8 ms ±   4.3 ms    [User: 2891.2 ms, System: 641.5 ms]
</span><span style="color:#323232;">  Range (min … max):   765.8 ms … 778.7 ms    10 runs
</span><span style="color:#323232;"> 
</span><span style="color:#323232;">  Warning: Ignoring non-zero exit code.
</span><span style="color:#323232;"> 
</span><span style="color:#323232;">Benchmark 2: pacman -Qkk pacman davinci-resolve
</span><span style="color:#323232;">  Time (mean ± σ):     10.589 s ±  0.018 s    [User: 9.371 s, System: 1.207 s]
</span><span style="color:#323232;">  Range (min … max):   10.550 s … 10.620 s    10 runs
</span><span style="color:#323232;"> 
</span><span style="color:#323232;">  Warning: Ignoring non-zero exit code.
</span><span style="color:#323232;"> 
</span><span style="color:#323232;">Summary
</span><span style="color:#323232;">  ./target/release/paketkoll --config-files=include pacman davinci-resolve ran
</span><span style="color:#323232;">   13.74 ± 0.08 times faster than pacman -Qkk pacman davinci-resolve
</span>

What about a some midsized packages (vtk 359 MB, linux 131 MB)?


<span style="color:#323232;">$ hyperfine -i -N --warmup 1 "./target/release/paketkoll vtk" "pacman -Qkk vtk"
</span><span style="color:#323232;">Benchmark 1: ./target/release/paketkoll vtk
</span><span style="color:#323232;">  Time (mean ± σ):      46.4 ms ±   0.6 ms    [User: 204.9 ms, System: 93.4 ms]
</span><span style="color:#323232;">  Range (min … max):    45.7 ms …  48.8 ms    65 runs
</span><span style="color:#323232;"> 
</span><span style="color:#323232;">Benchmark 2: pacman -Qkk vtk
</span><span style="color:#323232;">  Time (mean ± σ):     702.7 ms ±   4.4 ms    [User: 590.0 ms, System: 109.9 ms]
</span><span style="color:#323232;">  Range (min … max):   698.6 ms … 710.6 ms    10 runs
</span><span style="color:#323232;"> 
</span><span style="color:#323232;">Summary
</span><span style="color:#323232;">  ./target/release/paketkoll vtk ran
</span><span style="color:#323232;">   15.15 ± 0.23 times faster than pacman -Qkk vtk
</span><span style="color:#323232;">
</span><span style="color:#323232;">$ hyperfine -i -N --warmup 1 "./target/release/paketkoll linux" "pacman -Qkk linux"
</span><span style="color:#323232;">Benchmark 1: ./target/release/paketkoll linux
</span><span style="color:#323232;">  Time (mean ± σ):      34.9 ms ±   0.3 ms    [User: 95.0 ms, System: 78.2 ms]
</span><span style="color:#323232;">  Range (min … max):    34.2 ms …  36.4 ms    84 runs
</span><span style="color:#323232;"> 
</span><span style="color:#323232;">Benchmark 2: pacman -Qkk linux
</span><span style="color:#323232;">  Time (mean ± σ):     313.9 ms ±   0.4 ms    [User: 233.6 ms, System: 79.8 ms]
</span><span style="color:#323232;">  Range (min … max):   313.4 ms … 314.5 ms    10 runs
</span><span style="color:#323232;"> 
</span><span style="color:#323232;">Summary
</span><span style="color:#323232;">  ./target/release/paketkoll linux ran
</span><span style="color:#323232;">    9.00 ± 0.09 times faster than pacman -Qkk linux
</span>

For small sizes where neither tool performs much work, the majority is spent on fixed overheads that both tools have (loading the binary, setting up glibc internals, parsing the command line arguments, etc). For medium sizes paketkoll pulls ahead quite rapidly. And for large sizes pacman is painfully slow.

Just for laughs I decided to check an empty meta-package (base, 0 bytes). Here pacman actually beats paketkoll, slightly. Not a useful scenario, but for full transparency I should include it:


<span style="color:#323232;">$ hyperfine -i -N --warmup 1 "./target/release/paketkoll base" "pacman -Qkk base"
</span><span style="color:#323232;">Benchmark 1: ./target/release/paketkoll base
</span><span style="color:#323232;">  Time (mean ± σ):      13.3 ms ±   0.2 ms    [User: 15.3 ms, System: 18.8 ms]
</span><span style="color:#323232;">  Range (min … max):    12.8 ms …  14.1 ms    218 runs
</span><span style="color:#323232;"> 
</span><span style="color:#323232;">Benchmark 2: pacman -Qkk base
</span><span style="color:#323232;">  Time (mean ± σ):       8.8 ms ±   0.2 ms    [User: 2.8 ms, System: 5.8 ms]
</span><span style="color:#323232;">  Range (min … max):     8.4 ms …  10.0 ms    327 runs
</span><span style="color:#323232;"> 
</span><span style="color:#323232;">Summary
</span><span style="color:#323232;">  pacman -Qkk base ran
</span><span style="color:#323232;">    1.52 ± 0.05 times faster than ./target/release/paketkoll base
</span>

I always start a threadpool regardless of if I have work to do (and changing that would slow the case I actually care about). That is the most likely cause of this slightly larger fixed overhead.

Structuring Projects: when to include a dependency as a library instead of just calling a command line interface.

Hello fellow rustaceans! Recently, there was a thread about how we can grow this community (how can I link to posts across servers?), where I already talked briefly about this topic, saying that I did not know if it is worthy of a full post here, as most things seem to be pretty professional looking links to talks and blogs....

Vorpal,

Another aspect is that calling a cli command is way slower than a library function (in general). This is most apparent on short running commands, since the overhead is mostly fixed per command invocation rather than scaling with the amount of work or data.

As such I would at the very least keep those commands out of any hot/fast paths.

Vorpal,

That assembly program the author compares to is waay bloated. This guy managed with 105 bytes: nathanotterness.com/…/tiny_elf_modernized.html (that is with overlapping part of the code into the ELF header and other similar level shenanigans). ;)

All kidding aside, interesting article.

Vorpal,

The example FileDescriptorPollContext doesn’t really work. What if my runtime uses io-uring instead of polling? Those need very different interfaces to be sound. How do you abstract over that.

Vorpal,

Swedish layout. Not ideal for coding (too many things like curly and square brackets etc are under altgr. And tilde and backtick are on dead keys.

But switching back and forth as soon as you need to write Swedish (for the letters åäö) is just too much work. And yes, in the Swedish alphabet they are separate letters, not aao with diacretics.

Vorpal,

Two tips that work for me:

  • After cargo add I have to sometimes run the “restart rust-analyzer” command from the vscode command pallette (exact wording may be off, I’m on my phone as of writing this comment). Much faster than cargo build.
  • Consider using sccache to speed up rebuilds. It helps a lot, though uses a bit of disk space. But disk space is cheap nowadays (as long as you aren’t stuck with a laptop with soldered SSD, in which case you know what not to buy next time).
  • All
  • Subscribed
  • Moderated
  • Favorites
  • provamag3
  • kavyap
  • DreamBathrooms
  • InstantRegret
  • magazineikmin
  • ngwrru68w68
  • Durango
  • thenastyranch
  • Youngstown
  • rosin
  • slotface
  • tacticalgear
  • mdbf
  • ethstaker
  • JUstTest
  • khanakhh
  • osvaldo12
  • GTA5RPClips
  • cubers
  • cisconetworking
  • everett
  • tester
  • modclub
  • megavids
  • Leos
  • normalnudes
  • anitta
  • lostlight
  • All magazines