hschmale,
@hschmale@mastodon.sdf.org avatar

Has anyone put any thought into how to protect your personal blog from the generative ai scrapers? I've already blocked openai in robots.txt, but it seems like more and more small providers are popping up who don't honor these requests?

Maybe a noise filters artists are using with invisible characters but then again how do I make sure Google bot can see my posts? I don't care about humans using my work but I take issue with machines

hschmale,
@hschmale@mastodon.sdf.org avatar

But it's a struggle to come up with a solution. Everything I can think of hurts accessibility of my content. Invisible characters could break screen readers. Dynamic or decrypt my content is bad, and isn't really that strong, so I'm at a loss.

ParadeGrotesque,
@ParadeGrotesque@mastodon.sdf.org avatar

@hschmale

I have been asking myself the same question.

I think the answer is going to be:

  • regularly updated 'robots.txt' file (found a couple that seem up to date).
  • poison pill semantic nonsense using the META tag (so invisible to screen readers and most browsers).
  • transient document space with more poison pill fake pages to lead the AI slurper to dead ends.
  • some sort of copyright / copyleft notice squeezing very clearly "Not allowed to build AI corpus" or some such.
hschmale,
@hschmale@mastodon.sdf.org avatar

@ParadeGrotesque got any links for an up to date robots.txt?

Here's mine:
https://www.henryschmale.org/robots.txt

I'm thinking I might add a html comment to all my blog posts about that copyright/copy left.

But idk what to do beyond that.

And meta tags are still enough to do an awful lot and I need them to be good to make sharing easy.

ParadeGrotesque,
@ParadeGrotesque@mastodon.sdf.org avatar
  • All
  • Subscribed
  • Moderated
  • Favorites
  • generativeAI
  • ngwrru68w68
  • DreamBathrooms
  • thenastyranch
  • magazineikmin
  • InstantRegret
  • GTA5RPClips
  • Youngstown
  • everett
  • slotface
  • rosin
  • osvaldo12
  • mdbf
  • kavyap
  • cubers
  • megavids
  • modclub
  • normalnudes
  • tester
  • khanakhh
  • Durango
  • ethstaker
  • tacticalgear
  • Leos
  • provamag3
  • anitta
  • cisconetworking
  • JUstTest
  • lostlight
  • All magazines