nixCraft,
@nixCraft@mastodon.social avatar

Bad and freeloader behaviour
AnthropicAI. https://www.reddit.com/r/linux/comments/1ceco4f/claude_ai_name_and_shame/ Please boost to shame them.

mboelen,
@mboelen@mastodon.social avatar

@nixCraft
Yes, saw them as well on my end. Tried contacting them and not much of a response yet. We should feed them digital rubbish, so their products will output 🤡💩
@securingdev

raptor85,
@raptor85@mastodon.gamedev.place avatar

@nixCraft I'm not so sure I'd call it bad behavior, as far as I can tell their bot respects robots.txt and looking at the one for the LM forums they don't have any crawl delay set or any restrictions on indexing so any new large crawler that finds the site for the first time would likely have the same effect, as anything indexing the site that hasn't before will just continue branching without any delay, google will nuke you like this too if you put a big dataset up and they index it w/o delay

YurkshireLad,
@YurkshireLad@mastodon.social avatar

@nixCraft it’s a shame you can’t return a random block of text instead of the page content.

DamonHD,
@DamonHD@mastodon.social avatar

@nixCraft had to mail the press@ address to stop them re-checking every few seconds if I'd changed my mind about blocking them for effectively DoSing my site. Tech bros in a hurry...

iamdtms,
@iamdtms@mas.to avatar
fay,
@fay@lingo.lol avatar

@iamdtms @nixCraft @chriscoyier @beep please don't block CCBot though, it's extremely well behaved cc @pjox

pjox,
@pjox@mastodon.social avatar

@fay @iamdtms @nixCraft @chriscoyier @beep We Crawl very slowly and very politely, always respecting robots.txt. We have been doing so for years, way before LLMs. Yes some companies have used our crawls for AI training, but we’re mainly a research crawl, our goal is to provide resources to researchers, archive and actually increase visibility of underrepresented parts of the web.

pjox,
@pjox@mastodon.social avatar

@fay @iamdtms @nixCraft @chriscoyier @beep There are also people who are starting to use our crawls in order to build indexes and alternative open web search engines, which I love, I don’t believe a handful of companies should be deciding the content that people consume on the web.

iamdtms,
@iamdtms@mas.to avatar

@pjox @fay @nixCraft @chriscoyier @beep Thank you for letting me know. I'll act like this.

pixelriot,
@pixelriot@mas.to avatar

@iamdtms @nixCraft @chriscoyier @beep

Here's a maintained and updated robots.txt from the author of that blog:

https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.txt

fubaroque, (edited )
@fubaroque@mastodon.social avatar

@nixCraft Interesting. I just decided that enough was enough and am since yesterday redirecting (301 permanent) ClaudeBot to large files filled with random bytes elsewhere on the web. 🤣

Didn’t know it was scraping for AI. But I’m sure the “info” they get out of that will be useful to them. 🤭

cquest,
@cquest@amicale.net avatar

@nixCraft got the same on @osm_fr discourse forum.

They are now blacklisted, with others.

nixCraft,
@nixCraft@mastodon.social avatar

@cquest @osm_fr any idea how to black list them? robots.txt? CIDR block?

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • DreamBathrooms
  • thenastyranch
  • mdbf
  • Durango
  • Youngstown
  • slotface
  • hgfsjryuu7
  • vwfavf
  • rosin
  • kavyap
  • osvaldo12
  • PowerRangers
  • InstantRegret
  • magazineikmin
  • normalnudes
  • khanakhh
  • GTA5RPClips
  • ethstaker
  • cubers
  • ngwrru68w68
  • tacticalgear
  • everett
  • tester
  • Leos
  • cisconetworking
  • modclub
  • anitta
  • provamag3
  • All magazines