Hey, #Fediverse
My #mastodon instance stopped recognising updates from my #pixelfed instance that I follow from it about a week ago. Both appear to be otherwise federating okay.
When I open the followed pixelfed account from mastodon, it shows no updates since a week ago.
Where to start looking, please? I did a cleanup and moved to #S3 storage last week but have done many pixelfed posts since then.
#S7 does just what I want, but dare I use it in anything that needs to be maintainable? OO R implementations get replaced so quickly. Is it better to just pretend the only options are #S3 and #S4? #rstats
@sjcowtan They are not replaced quickly, that's why there are so many. But S7 is < 1.0.0 so not quite production ready. There might be some breaking changes down the path.
@bram I never looked into it because once in a while I've read horror stories of people who forgot something running there and got a big bill, I really can't afford to make mistakes like that xD
After some weeks of silence, there was some free time for a little blog post once again - after discovering that my very small instance took almost 1,8 TB of #S3-storage:
It's worthwhile to expand on a point to @devnull that I made: "preventing the sending server from seeing the IP" is a mostly* BS justification for local caching of media.
Broadly speaking:
Inconsistency around security policies is a recipe for dramatic, consequential failures.
Users are not notified if this is a feature, and clients and servers can both override it.
This is not to say that there are never circumstances where you want to hide your IP.
It's just that if that's a feature we want in the fediverse, S3 media caching is a very expensive solution to 1% of the problem, with a lot of other hidden costs and considerations to deal with.
If you want that, we need to have a much deeper conversation about what it means to provide that obfuscation and you should probably be using Tor, at a minimum.
(addendum to have a post to reference for reply-guys)
This is not to say that there may not be other reasons to use media caching, but we should use those justifications (and I have arguments to make there as well, but it's a separable discussion and more about "is this the correct tool" rather than "this is not doing what you want")
We do an annoying amount of post-hoc justifications in the fediverse where an actual reason for something is obscured while another reason is presented
On multiple occasions I've listened to instance admins speak about high S3 costs. The sheer amount of data absolutely balloons the more activity your server sees, I get it.
What I don't get is whether there's some unknown fedi ethical reason everybody insists on setting up an S3 cache (followed immediately by complaining about it).
Y'all want to know what the rest of the web does? Hosts their own uploaded media, and links out to the rest...
@FenTiger@kevinriggle@devnull@hrefna (if 1000 simultaneous hits takes down your web server what potato are you running it on and why are they even touching the database, c’mon man)
In a recent announcement, Pixelfed creator Daniel Supernault (@dansup), shared exciting news for Pixelfed instance administrators. A forthcoming feature is set to empower admins by allowing the storage of imported media from Instagram directly on S3 Storage.
The development is part of a pull request (PR) on GitHub, where Supernault detailed the functionality of the feature. Admins will soon have the ability to opt-in to store Instagram-imported media on S3 filesystem driver. This marks a significant enhancement for Pixelfed instances, providing a seamless integration for media management.
Key Configuration Details:
To enable or disable the feature, admins can set PF_IMPORT_IG_CLOUD_STORAGE to true or false. Notably, this can only be activated if Cloud Storage (PF_ENABLE_CLOUD) is enabled. However, admins have the flexibility to disable this feature and retain Instagram-imported media locally, even with Cloud Storage enabled.
Existing local media will be seamlessly migrated without requiring any action from admins. A cron job will automatically handle the migration of both existing and new Instagram media. While the process may take some time for instances with substantial media content, Pixelfed assures administrators that the system is designed to efficiently manage the transition.
Migration Process:
During the migration, Pixelfed has chosen to silently update media URLs to avoid sending unnecessary “Update” activities. This careful approach ensures a smooth experience for users, with local media URLs gracefully redirecting to their corresponding S3 URLs when appropriate.
Pixelfed’s commitment to user experience and efficient media management is evident in this upcoming feature. Admins can anticipate enhanced control over media storage, providing a more seamless and scalable solution for Pixelfed instances.
The Pixelfed community eagerly awaits the official release of this feature, anticipating its positive impact on the platform’s media management capabilities.
After some great discussion here yesterday on the topic of hijacked S3 buckets, I wrote up this blog post covering how I've combatted this in the past at multiple organizations.
In the post we explore how S3 bucket takeover occurs and how you can prevent it for buckets you own. Ultimately this is a software supply chain attack and should be a addressed as a security issue. #security#aws#s3
I just came across a great article by Antonia Langfelder on #ApacheTika's tika-pipes module and the /async handler, enabling reading from and writing to #s3.
The point about setting 'OMP_THREAD_LIMIT=1' to limit tesseract is interesting.
@tallison I use to recompile tesseract with configure --disable-openmp as suggested here https://github.com/tesseract-ocr/tesseract/issues/943 I recall a benchmark which indicated that Tesseract, when compiled without OpenMP, performs faster compared to the version compiled with OpenMP but with its features disabled.
I have ses receive mail and put it directly into an s3 bucket.
Bucket has a notification to topics for creates into the report and forensic subfolders to a sns/sqs that feeds the lambda to process them. Then I can batch them.
Then lifecycle policy on bucket to clean up reports.
@b3cft I only need it for a handful of domains, so expect low traffic. Shoving the reports into DynamoDB and build some trivial "show me rows for $host between $date and $date2" front end should be easy. I think!