marcel,
@marcel@waldvogel.family avatar

Even though usage just went down 10%, is still a (the?) favorite tech topic. As a change to all the horror predictions, I decided to have a look at how to control indexing of web content by AI crawlers.

For traditional search engines, we have plenty of control over crawling (robots.txt etc.), preview information (OpenGraph and friends) or presentation (oEmbed).

However, there is (yet) still little control over how to use web content for AI.
https://netfuture.ch/2023/07/blocking-ai-crawlers-robots-txt-chatgpt/

marcel,
@marcel@waldvogel.family avatar

Besides outright ‌s, there is little control on indexing your site.

The only AI-specific crawlers I could find that can be controlled with robots.txt were

  • CCbot (Common Crawl) and
  • ChatGPT-User (for plugin stuff)

There apparently is an "OpenAI" crawler out there, but I could find no documentation on their site or evidence in my web server logs.

What additional information do you have?
https://netfuture.ch/2023/07/blocking-ai-crawlers-robots-txt-chatgpt/

marcel,
@marcel@waldvogel.family avatar

Even though many in the end might decide to have most of their content available to , I still think it is important to have controls and transparency. Especially if big companies look like they are making money with our content, without compensation.

by seems a good approach; and maybe also something might come out of the proposal on upgrading robots.txt. But none of them will be usable in the near future.


https://www.thestar.com/politics/federal/2023/05/30/business-would-be-over-canadas-news-publishers-say-ban-by-google-and-facebook-would-devastate-them.html

  • All
  • Subscribed
  • Moderated
  • Favorites
  • ChatGPT
  • DreamBathrooms
  • mdbf
  • ngwrru68w68
  • magazineikmin
  • thenastyranch
  • rosin
  • khanakhh
  • osvaldo12
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • InstantRegret
  • tacticalgear
  • anitta
  • ethstaker
  • modclub
  • cisconetworking
  • tester
  • GTA5RPClips
  • cubers
  • everett
  • megavids
  • provamag3
  • normalnudes
  • Leos
  • JUstTest
  • lostlight
  • All magazines