brucelawson,
@brucelawson@vivaldi.net avatar

Can I use robots.txt to tell Open AI and its mates that my content isn't for scraping?

janl,
@janl@narrativ.es avatar

@brucelawson you could make it as “this has all been LLM generated” and they will leave you alone.

LDJ,
@LDJ@mastodon.green avatar

@janl @brucelawson There was a discussion on this recently and unfortunately they don’t seem to be honouring the robots.txt - one site was having to individually block the domains, but there’s another one every day 😕

garrettc,
@garrettc@mastodon.org.uk avatar
ppatel,
@ppatel@mstdn.social avatar

@garrettc @brucelawson How do you deal with Google though?

garrettc,
@garrettc@mastodon.org.uk avatar

@ppatel How do you mean? Do you want to block Google's crawler?

ppatel,
@ppatel@mstdn.social avatar

@garrettc Considering Google's latest privacy changes appear to have declared that the company is going to use pretty much everything on the web for training its models, I'm concerned about Google no longer honoring robots.txt.

garrettc,
@garrettc@mastodon.org.uk avatar

@ppatel I think they'd be creating a lot of trouble with the web community if they decided not to honour the robots.txt standard.

ppatel,
@ppatel@mstdn.social avatar

@garrettc I'm not seeing the current management caring about that much these days.

patrick_h_lauke,
@patrick_h_lauke@mastodon.social avatar

@brucelawson even if you could, would they honour it? "i'm afraid i can't do that, bruce"

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • Durango
  • DreamBathrooms
  • cisconetworking
  • tester
  • ngwrru68w68
  • magazineikmin
  • osvaldo12
  • thenastyranch
  • rosin
  • Youngstown
  • slotface
  • everett
  • kavyap
  • mdbf
  • anitta
  • GTA5RPClips
  • provamag3
  • khanakhh
  • ethstaker
  • InstantRegret
  • tacticalgear
  • modclub
  • cubers
  • megavids
  • normalnudes
  • Leos
  • JUstTest
  • lostlight
  • All magazines