Bad and freeloader behaviour... - Random

nixCraft, 12 days ago

Bad and freeloader behaviour
AnthropicAI. https://www.reddit.com/r/linux/comments/1ceco4f/claude_ai_name_and_shame/ #linux #linuxmint #opensource Please boost to shame them.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ RockyC, AdeptVeritatis, ncrav, Binder +12 more

Image

Image alternative text

mboelen, 12 days ago

@nixCraft
Yes, saw them as well on my end. Tried contacting them and not much of a response yet. We should feed them digital rubbish, so their products will output 🤡💩
@securingdev

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

raptor85, 12 days ago

@nixCraft I'm not so sure I'd call it bad behavior, as far as I can tell their bot respects robots.txt and looking at the one for the LM forums they don't have any crawl delay set or any restrictions on indexing so any new large crawler that finds the site for the first time would likely have the same effect, as anything indexing the site that hasn't before will just continue branching without any delay, google will nuke you like this too if you put a big dataset up and they index it w/o delay

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

YurkshireLad, 12 days ago

@nixCraft it’s a shame you can’t return a random block of text instead of the page content.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

DamonHD, 12 days ago

@nixCraft had to mail the press@ address to stop them re-checking every few seconds if I'd changed my mind about blocking them for effectively DoSing my site. Tech bros in a hurry...

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

iamdtms, 12 days ago

@nixCraft Blocking AI Scraper Bots
https://chriscoyier.net/2023/09/19/blocking-ai-scraper-bots/
By @chriscoyier

Blocking bots
https://ethanmarcotte.com/wrote/blockin-bots/
By @beep

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

fay, 12 days ago

@iamdtms @nixCraft @chriscoyier @beep please don't block CCBot though, it's extremely well behaved cc @pjox

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pjox, 12 days ago

@fay @iamdtms @nixCraft @chriscoyier @beep We Crawl very slowly and very politely, always respecting robots.txt. We have been doing so for years, way before LLMs. Yes some companies have used our crawls for AI training, but we’re mainly a research crawl, our goal is to provide resources to researchers, archive and actually increase visibility of underrepresented parts of the web.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pjox, 12 days ago

@fay @iamdtms @nixCraft @chriscoyier @beep There are also people who are starting to use our crawls in order to build indexes and alternative open web search engines, which I love, I don’t believe a handful of companies should be deciding the content that people consume on the web.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

iamdtms, 12 days ago

@pjox @fay @nixCraft @chriscoyier @beep Thank you for letting me know. I'll act like this.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pixelriot, 12 days ago

@iamdtms @nixCraft @chriscoyier @beep

Here's a maintained and updated robots.txt from the author of that blog:

https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.txt

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

fubaroque, 12 days ago (edited 12 days ago)

@nixCraft Interesting. I just decided that enough was enough and am since yesterday redirecting (301 permanent) ClaudeBot to large files filled with random bytes elsewhere on the web. 🤣

Didn’t know it was scraping for AI. But I’m sure the “info” they get out of that will be useful to them. 🤭

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

cquest, 12 days ago

@nixCraft got the same on @osm_fr discourse forum.

They are now blacklisted, with others.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

nixCraft, 12 days ago

@cquest @osm_fr any idea how to black list them? robots.txt? CIDR block?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment