Been summing up reports about the sizes of all the scientific data stored using #gitAnnex and it at least 3-5 petabytes, and growing by at least 2 petabytes per year currently.
In 2022, Github consisted of ~4 petabytes of data (excluding replicated data).
This is also in the neighborhood of the total size of the Library of Congress's digital collections, which was 3 petabytes in 2012.
Dunno who will win this race, but I'm surprised to be in it. ;-)
Last week I prototyped a git remote helper in a shell script, and now I'm rewriting that in #haskell as part of #gitAnnex.
I don't do this often and I wonder if it was a mistake, probably I should have written the prototype in haskell and then integrated it into git-annex. It's kind of amazing how a lot of complexity is melting away and also how I'm adding So Many Types and also throwing in a lot of robustness improvements.
People love recommending and raving about #Syncthing - it's a really cool project but honestly despite thinking hard about it, I still have not found any use for it (which is a lil frustrating ngl). I think the issue is I'm already used to syncing stuffs through #Git anyway lol and I kinda prefer it since it's a lil more "explicit" or intentional + you could add notes/context to the commit if needed.
Here at #distribits unconference I quickly demonstrate @willmcgugan's #textual framework for website-like #Python#TUI's (with my :gitannex: #gitAnnex control-center mockup) and also talk a bit about a compression algorithm benchmark that I once did.
This is my and @mih and Timothy Sanders's result of the #distribits hackathon, design for #gitAnnex special remotes to support storing git repositories. We improved on git-remote-datalad-annex significantly I think and I hope to implement this as part of #gitAnnex.
When you write a software to manage your cat photos and it gets used for brain slicing scans to the tune of 2 petabytes brain/year. #gitAnnex#distribits
🚀 #annextimelog v0.12 finally closes many usage gaps and is quite fun to use now:
> atl tr work @office since 10 # start work
> atl tr 12 - 14 meeting # record smth
> atl tr boom at 16:00 title="🤯" todo="find out what that was" # one point in time, todo added
> atl stop work # stop working in the afternoon
> atl ls todo # list events marked as todo (the explosion above)
> atl mod boom set todo= # remove todo again
🚀 #annextimelog v0.10.0 marks an important milestone: it can record, delete and now also edit events! 🥳 This makes it useable as an actual time tracker/logbook and due to its flexible git annex-based metadata system even as a todo list!
> atl tr for 2h meeting with=matt,mary project=A # track a meeting
> atl mod today meeting set todo # add the todo tag we forgot
> atl ls todo # lists events marked as todo
I brainstormed what a cli time tracker based on :gitannex: #gitAnnex could look like and detail how it would improve on the issues I have with #timewarrior and #hledger#timeclock:
I'm getting quite fed up of Signal. It is secure for many popular definitions of secure, but if your definition includes "data integrity" then it isn't.
After a few weeks of being away, I open Signal on my desktop and it indicates that I have to log in again. After logging in again, none of the messages received during these weeks show up.
It's not the first time that something like this happens. In the past, I've been forcibly logged out and when logging back in all message were gone.
@whynothugo Agreed! The inability to easily access or even just export text messages and media (in flow!) bugs me really hard with all messengers I use (maybe except #Matrix for which there surely are tools): #WhatsApp (don't...), #Threema, #SimpleXchat
I'd absolutely love a way to :gitannex: #gitAnnex (certain of my) chats.
Please don’t put your changelog on GitLab’s “release notes” feature.
Someone wanting to read them needs to load a 5MB web app, just to read some markdown text. It’s completely unusable on anything that’s not a high speed connection.
@whynothugo Yes! I put them into the notes of the version tag and made myself a little alias displaying it nicely. I use these notes to auto-generate a changelog in the docs building process.
I lile sourcing everything from version control. The actual version (not one manually bumped number in one or more config files, ugh), the release notes (tah notes), assets with :gitannex: #gitAnnex
One downside is that release notes are then effectively unchangeable (except force-push...)
How can you access our #PythonForSciComp videos without YouTube? Via this repository, #GitAnnex to distribute raw videos around. One of those places is a publicly accessible object storage, and with a few commands you can download the processed videos. https://github.com/coderefinery/video-processing/
Problem: yes, git-annex installation is a barrier. This isn't designed as a primary access method but a backup.
If I see this correctly, local file dependencies included in the flake (next to it, in the same git repo) are copied into the nix store by taking the git-tracked content (i.e. what 'git show' spits out) instead of what's actually on-disk.
This is fatal for :gitannex: #gitAnnex-tracked dependencies in the repo, because then you end up with non-nix (or whatever) files in the nix store (e.g. a text file with /annex/objects/bla content).
Here's the English re-recording of my workshop kickoff talk about @joeyh 's :gitannex: Git Annex, an awesome tool to sync, manage and archive files based on git.
The original talk was in German and I got requests to translate it to English, so here it is! 🥳