@raffaele@digipres.club
@raffaele@digipres.club avatar

raffaele

@raffaele@digipres.club

— what I do: digital libraries, metadata, web archiving, digital preservation, IIIF, Readium LCP
— what I love: mountains, bikes, JG Ballard, Colin Wilson

This profile is from a federated server and may be incomplete. Browse more on the original instance.

raffaele, to random
@raffaele@digipres.club avatar
raffaele, to random
@raffaele@digipres.club avatar

The project of one of the most expensive (and useless) infrastructures in Italy, the bridge that will connect Sicily to Italy, has faulty PDFs.
https://www.ilpost.it/flashes/documenti-illeggibili-ponte-stretto/ (ita)

hello @wtfpdf !

raffaele, to random
@raffaele@digipres.club avatar

Libraries, the new Sodoma and Gomorrah

raffaele,
@raffaele@digipres.club avatar

@moira I have seen it on X. Here is the video https://www.youtube.com/watch?v=mQ1-NtWSDV4 I cannot guarantee it's not fake. My toot was meant to be sarcastic, sorry if unclear

raffaele, to random
@raffaele@digipres.club avatar

Software libero and freedom in the digital society
martedi 9 aprile alle 15:00 Richard Stallman è a Bologna
https://balotta.org/event/software-libero-and-freedom-in-the-digital-society-seminario-a-cura-di-richard-stallman

raffaele, to random
@raffaele@digipres.club avatar

"and rehosting is similar to forking– it is often seen as disrespectful or worse. Talk to each other"
https://mastodon.archive.org/@textfiles/112198791321549063

slightly disagree, allowing rehosting (aka mirroring) for digital cultural heritage is good, and should be endorsed.

raffaele, to random
@raffaele@digipres.club avatar

Revisiting the eDonkey p2p network after +15 years (I have just installed aMule) – and I am amazed. In just a few searches, I found rare books and documentaries that I couldn't locate elsewhere.

Are sharing communities the ultimate digital libraries?

raffaele,
@raffaele@digipres.club avatar

@nemobis And most embarrassing of all, the only metadata available for searching is the title of the file.

raffaele, to random
@raffaele@digipres.club avatar

https://cohost.org/arborelia/post/4968198-the-software-heritag

This is a serious discussion about digital archiving and data immutability.
How should we deal with a where immutability is inherent in the technological design?

(I am not able to speak on the subject of transphobia)

edsu, to random
@edsu@social.coop avatar

It's hard not to be struck by the importance of continued maintenance in 's recent report about the ransomware attack they suffered:

I left some annotations on the PDF in hypothesis if that's your jam.

raffaele,
@raffaele@digipres.club avatar

@edsu the point I fear most, and I think is very common to many institutions:
"“A few key software systems, including the library management system, cannot be brought back in the form that they existed in before the attack, either because they are no longer supported by the vendor and the software is no longer available”"

ajsadauskas, (edited ) to tech
@ajsadauskas@aus.social avatar

In an age of LLMs, is it time to reconsider human-edited web directories?

Back in the early-to-mid '90s, one of the main ways of finding anything on the web was to browse through a web directory.

These directories generally had a list of categories on their front page. News/Sport/Entertainment/Arts/Technology/Fashion/etc.

Each of those categories had subcategories, and sub-subcategories that you clicked through until you got to a list of websites. These lists were maintained by actual humans.

Typically, these directories also had a limited web search that would crawl through the pages of websites listed in the directory.

Lycos, Excite, and of course Yahoo all offered web directories of this sort.

(EDIT: I initially also mentioned AltaVista. It did offer a web directory by the late '90s, but this was something it tacked on much later.)

By the late '90s, the standard narrative goes, the web got too big to index websites manually.

Google promised the world its algorithms would weed out the spam automatically.

And for a time, it worked.

But then SEO and SEM became a multi-billion-dollar industry. The spambots proliferated. Google itself began promoting its own content and advertisers above search results.

And now with LLMs, the industrial-scale spamming of the web is likely to grow exponentially.

My question is, if a lot of the web is turning to crap, do we even want to search the entire web anymore?

Do we really want to search every single website on the web?

Or just those that aren't filled with LLM-generated SEO spam?

Or just those that don't feature 200 tracking scripts, and passive-aggressive privacy warnings, and paywalls, and popovers, and newsletters, and increasingly obnoxious banner ads, and dark patterns to prevent you cancelling your "free trial" subscription?

At some point, does it become more desirable to go back to search engines that only crawl pages on human-curated lists of trustworthy, quality websites?

And is it time to begin considering what a modern version of those early web directories might look like?

@degoogle

raffaele,
@raffaele@digipres.club avatar

@ajsadauskas @degoogle a bit of history of Yahoo here, started as a web directory https://www.wired.com/1996/05/indexweb/

awinkler, to random
@awinkler@openbiblio.social avatar

Does anybody know of reasonably efficient workflows to bulk upload bibliographic metadata to ? Adding items via GUI is annoyingly repetitive and slow. Reconciling and uploading via is probably one possibility, but that's not trivial. Has anybody come across tutorials or descriptions of workflows? There are also tools such as "Author Diesambiguation" (https://author-disambiguator.toolforge.org/) that would have to be combined with other services. Help appreciated!

raffaele,
@raffaele@digipres.club avatar
raffaele, to random
@raffaele@digipres.club avatar

Indexing the information age
https://aeon.co/essays/the-birth-of-our-system-for-describing-web-content

The birth of DublinCore

glynmoody, to random
@glynmoody@mastodon.social avatar

Air could be significant cause of – even for those not predisposed - https://www.theguardian.com/environment/2024/feb/21/air-pollution-could-be-significant-cause-of-dementia-even-for-those-not-predisposed?CMP=twt_a-environment_b-gdneco "People in areas of high PM2.5 concentrations had higher amounts of amyloid plaques in brain"

raffaele,
@raffaele@digipres.club avatar

@glynmoody Pretty bad! Northern Italy is one of the most polluted regions in the world these weeks.

simon, to random
@simon@simonwillison.net avatar

I tried feeding a 7s video of my bookshelf into Gemini Pro 1.5 to get back a JSON array of books... and it worked!
https://simonwillison.net/2024/Feb/21/gemini-pro-video/

raffaele,
@raffaele@digipres.club avatar
raffaele, to random
@raffaele@digipres.club avatar

Meanwhile, in Italy, a new political party has been formed to run in the European elections. The spokeperson is an IA avatar.
This is not a prank, nor is it funny.

raffaele, to random
@raffaele@digipres.club avatar

The Cost of a Digital Archive
https://lil.law.harvard.edu/blog/2024/02/08/the-cost-of-a-digital-archive/

Interesting reading, especially the environmental cost analysis.

raffaele, to random
@raffaele@digipres.club avatar
tallison, to random
@tallison@mastodon.social avatar

I just came across a great article by Antonia Langfelder on 's tika-pipes module and the /async handler, enabling reading from and writing to .

The point about setting 'OMP_THREAD_LIMIT=1' to limit tesseract is interesting.

https://medium.com/wellcome-data/how-to-parse-millions-of-pdf-documents-asynchronously-with-apache-tika-d27e06e57b22

raffaele,
@raffaele@digipres.club avatar

@tallison I use to recompile tesseract with configure --disable-openmp as suggested here https://github.com/tesseract-ocr/tesseract/issues/943 I recall a benchmark which indicated that Tesseract, when compiled without OpenMP, performs faster compared to the version compiled with OpenMP but with its features disabled.

glynmoody, to random
@glynmoody@mastodon.social avatar

Drivers protest as Bologna becomes first Italian city to bring in 30km/h limit - https://www.theguardian.com/world/2024/jan/19/drivers-protest-as-bologna-becomes-first-italian-city-to-bring-in-30kmh-limit non cominciamo, ragazzi...

raffaele,
@raffaele@digipres.club avatar

@glynmoody Bologna is a city more than a thousand years old, it's not a metropolis. 30 km/h is perfectly right. The use of cars should be reduced (or eliminated), and and this behavior is no longer sustainable for the environment: "One worker told me he no longer has time to drive home for lunch, so has to make do with a sandwich"

raffaele,
@raffaele@digipres.club avatar

@glynmoody also Bologna: the most polluted area in Europe https://bologna.maps.sensor.community/#8/44.981/10.816

raffaele, to random
@raffaele@digipres.club avatar

Does anyone here use a Lenovo ThinkPad X1 Carbon (generation 11) with Linux?

raffaele, to random
@raffaele@digipres.club avatar

meanwhile in Italy: during a test, the computers were unable to open the PDFs, and approximately 900 candidates were sent back home.
https://www-ilpost-it.translate.goog/2023/10/30/sna-concorso-pubblico-scuola-nazionale-amministrazione/?_x_tr_sl=it&_x_tr_tl=en&_x_tr_hl=it&_x_tr_pto=wapp

ciao @wtfpdf !

quinta, to random Italian
@quinta@mastodon.uno avatar

il Consiglio di stato annulla l'aggiudicazione del Polo strategico nazionale.

è stata assegnazione diretta
(come l'app io ed il polo delle notifiche, peraltro)

un bel casino

raffaele,
@raffaele@digipres.club avatar

@nemobis @quinta c'è qualche link sulla notizia? grazie

raffaele, (edited )
@raffaele@digipres.club avatar

@quinta @nemobis trovato, grazie

  • All
  • Subscribed
  • Moderated
  • Favorites
  • provamag3
  • kavyap
  • DreamBathrooms
  • tacticalgear
  • magazineikmin
  • vwfavf
  • Youngstown
  • ngwrru68w68
  • ethstaker
  • slotface
  • rosin
  • mdbf
  • thenastyranch
  • PowerRangers
  • anitta
  • modclub
  • Durango
  • cubers
  • osvaldo12
  • GTA5RPClips
  • everett
  • khanakhh
  • InstantRegret
  • Leos
  • tester
  • normalnudes
  • cisconetworking
  • megavids
  • All magazines