Diffing can be incredibly helpful for managing software! It can be used to retrofit CRDT support, monitor changes being made, guide how the software should be adjusted to match desired output(s), etc. Maybe for our Output Unit we'll have tool which autoruns & diffs a converter to guide its development, maybe adding automation where we find it worth the effort?
So how'd we implementing diffing on our string-centric hardware? Strings specifically? Its not trivial!
If the document's already stored as a CRDT we could just ask which edits are in one CRDT-document vs the other. Maybe postprocessing it to determine when both peers made equivalent edits!
As for state snapshots...
The core (Dynamic Programming) algorithm for a optimal text diff involves a building table of the max number of common characters between 2 strings. Common chars increment this count, differing chooses an optimal path. Then traverse a path back-to-front to output.
Yesterday I discussed collaboratively editing text, but there's other things you might want to edit!
Implementation notes: I'd keep the data in RAM in its sorted & compressed form, only decompressing it transiently for processing & modifying. Zipping from columnar format into "edit" rows would be most involved step.
For instance we'd want to edit not only individual files but whole directories of them! And the attributes attached to each of its files!
To implement such a "mapping" CRDT we can define an operation which sets the value at a given key to a given value, possibly to a "tombstone" value to indicate deletions. The problem comes when 2 peers attempt to set the same key to different values!
If we track our understanding of the latest edits our peers are aware of (a "lampart timestamp"), storing that in our edits' metadata, we can establish a causal ordering to these edits. When that fails we can compare peer IDs.
This causal sort is also vital for ordering text insertions!
Talking about text... Lists would be implemented basically the same way but with greater freedom regarding which data you can store in each node.
We may want to incorporate counters, by summing all the nudges up/down collaborators have made.
And once we have collections we'd want to store numbers, constant strings, booleans, etc in them. Combine these datatypes and we should be able to define most other CRDTs you might want.
The catch with building new CRDTs upon these primitives is you need to deal with changes to this "schema", so we'd store any breaking changes in the CRDT. Adding a "Cambrian" wrapper around our AutoMerge implementation which evaluates the appropriate breaking schema changes forwards or backwards. Pretending to the caller like the document's in the version it expects.
Furthermore I'd probably implement a framework for implementing editors (could include the text editor!) upon CRDTs.
This editor framework would use CRDTs to implement features I'd say should be tablestakes like infinite undo, autosave, collaboration, & composing these editors together! Whilst simplifying the data the editor itself needs to output.
I'll hold off on describing an XML CRDT until Ink & Switch has finished writing up what they've just figured out, for me to summarize as part of this hypothetical!
Continuing my study of ELF Utils' commandline tools...
After initialization both I/O & internationalization as well as parsing commandline flags ar configures LibELF to a specified format version, parses/validates the commandline flags some more ensuring additional args remain, pops the archive name as a commandline arg, & branches over the subcommand specified by those flags.
This may output some help text via LibArgP. Or it may...
For Insert operations ar opens the ELF file, initializes some memory including a symboltable, for oper_qappend operations populates a hashmap ( LibC's search.h one), iterates over the ELF file, followed by extensive validation & cleaning up! For each ELF header (skipping filepaths for performance's sake) initializes a new archive entry for it, considers whether its present in the hashmap, & prepends the entry into a linkedlist.
Primarily ar performs "extractions" ("list" & "print" are variations), involving opening the archive & populating a hashmap, fstat()s the file, iterates over ELF entries, possibly atomically overwrites the file with extracted symbol table, & cleans up. For each entry it handles filepaths specially, extracts it into a new ELF symboltable, and/or checks for presence in the hashmap, & possibly formats human output with lots of options.
After parsing commandline flags & configuring supported ELF version elfclassify iterates over remaining args then maybe stdin lines. For each it carefully opens the ELF file, retrieves ELF kind or E/P/S headers to retrieve various properties with optional logging, performs a bunch of checks on those properties possibly outputting results, & aggregates to output as text or exit code.
... elfcmp configures supported ELF version, carefully opens the 2 given ELF files, retrieves their E headers checking equality on it, retrieves S header numbers to compare if they're equal, then P headers, S header index, various properties of all non-empty sections branching upon each's type, & the count of those sections. After which it gather E & P header properties, allocates regions possibly populating/sorting it, iterates over all P headers to compare them, & cleans up.
I've been toying with the idea of switching to Linux for a while, and its just getting to that point now, you know? I hear Mint is the best for beginners?
"Note that Recall does not perform content moderation. It will not hide information such as passwords or financial account numbers."
The computer, however, will stop you from recording DRM'd content.
Find it fascinating that when faced with drawing safety and security boundaries, the primary beneficiary is not the owner of the device, or the person using it, but random corporations who control the intellectual property rights.
@sarahjamielewis We could certainly get a simpler & more capable solution to the problem Recall addresses with modern Computer Science by essentially reimplementing the entire OS to use CRDTs, but who's got the time for that?
The next best thing would be filesystems with efficient backups... Like BTRFS!
So here's the thing with #Microsoft's new #Recall feature:
It's not about Microsoft now suddenly spying on you. They can probably already do that if they want in a much easier way without you knowing.
So please be more realistic!
The far more severe #privacy concern in the age of #remote work is when person A shares their screen and person B having Recall enabled, thereby "recalling" the other person's screen without person A knowing.