@adamchainz This sounds very similar to snapshot testing (where the output of a long string is saved & the test fails when the string changes... the snapshot tool shows a diff & you decide if you wanted that diff).
This also sounds like how I'd implement a "pip install" for a gist or a stackoverflow question, both of which evolve & are not real packages, so they'd need ad hoc tooling support.