Blocking AI crawlers with Caddy

I was reading the reddit thread on Claude AI crawlers effectively DDOSing Linux Mint forums libreddit.lunar.icu/…/claude_ai_name_and_shame/

and I wanted to block all ai crawlers from my selfhosted stuff.

I don’t trust crawlers to respect the Robots.txt but you can get one here: darkvisitors.com

Since I use Caddy as a Server, I generated a directive that blocks them based on their useragent. The content of the regex basically comes from darkvisitors.

Sidenote - there is a module for blocking crawlers as well, but it seemed overkill for me github.com/Xumeiquer/nobots

For anybody who is interested, here is the block_ai_crawlers.conf I wrote.


<span style="color:#323232;">(blockAiCrawlers) {
</span><span style="color:#323232;">  @blockAiCrawlers {
</span><span style="color:#323232;">    header_regexp User-Agent "(?i)(Bytespider|CCBot|Diffbot|FacebookBot|Google-Extended|GPTBot|omgili|anthropic-ai|Claude-Web|ClaudeBot|cohere-ai)"
</span><span style="color:#323232;">  }
</span><span style="color:#323232;">  handle @blockAiCrawlers {
</span><span style="color:#323232;">    abort
</span><span style="color:#323232;">  }
</span><span style="color:#323232;">}
</span><span style="color:#323232;">
</span><span style="color:#323232;"># Usage:
</span><span style="color:#323232;"># 1. Place this file next to your Caddyfile
</span><span style="color:#323232;"># 2. Edit your Caddyfile as in the example below
</span><span style="color:#323232;">#
</span><span style="color:#323232;"># ```
</span><span style="color:#323232;"># import block_ai_crawlers.conf
</span><span style="color:#323232;">#
</span><span style="color:#323232;"># www.mywebsite.com {
</span><span style="color:#323232;">#   import blockAiCrawlers
</span><span style="color:#323232;">#   reverse_proxy * localhost:3000
</span><span style="color:#323232;"># }
</span><span style="color:#323232;"># ```
</span>
LiveLM,

Huh, looks like the post in r/linux got removed for not being relevant.
What a joke.

hollyberries,

I’m a fan of hellpotting them.

Cyber,

Ooh, didn’t know about that one… thanks

Para_lyzed,

From your recommendation, I found a related project pandoras_pot that I am able to run in a Docker container, and seems to run more efficiently on my Pi home server. I now use it in my Caddyfile to redirect a number of fake subdomains and paths that are likely to be found by a malicious bot (of course all are excluded in my robots.txt for bots that actually respect it). Thanks for the recommendation!

arisunz,
@arisunz@lemmy.blahaj.zone avatar

I got meaner with them :3c

acockworkorange,

That’s devilishly and deliciously devious.

JustARegularNerd,

I just want you to know that was an amazing read, was actually thinking “It gets worse? Oh it does. Oh, IT GETS EVEN WORSE?”

arisunz,
@arisunz@lemmy.blahaj.zone avatar

lmao that means a lot, thanks <3

Deckweiss,

The nobots module I’ve linked bombs them

pvq,

I really like your site’s color scheme, fonts, and overall aesthetics. Very nice!

not_amm,

I agree, it’s readable and very cute!

jkrtn,

This is one of the best things I’ve ever read.

I’d love to see a robots.txt do a couple safe listings, then a zip bomb, then a safe listing. It would be fun to see how many log entries from an IP look like get a, get b, get zip bomb… no more requests.

ABasilPlant, (edited )

In dark mode, the anchor tags are difficult to read. They’re dark blue on a dark background. Perhaps consider something with a much higher contrast?

A picture of a website with a dark purple background and dark blue links.

Apart from that, nice idea - I’m going to deploy the zipbomb today!

arisunz,
@arisunz@lemmy.blahaj.zone avatar

nice catch, thanks (i use light mode most of the time)

winnie,

Suggestion at the end:


<span style="color:#323232;">  <a class="boom" href="https://boom .arielaw.ar">hehe</a>
</span>

Wouldn’t it destroy GoogleBot (and other search engine) those making your site delisted from Search?

Arghblarg,
@Arghblarg@lemmy.ca avatar

We should do more than block them, they need to be teergrubed.

winnie,

Your link has no article, and Video inside Flash file (swf) that itn’t opening in 2024.

And I don’t want to install Flash on my machine…

boredsquirrel,

Such a cool person making the video available for download

Deckweiss, (edited )

Thats an easy modification. Just redirect or reverse proxy to the tarpit instead of abort .

I was even thinking about an infinitely linked data-poisoned html document, but there seemed to be no ready made project that can generate one at the moment. (No published data-poisoning techniques for plain text at all afaik. But there is one for images.)

Ultimately I decided to just abort the connection as I don’t want my servers to waste traffic or CPU cycles.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • linux@lemmy.ml
  • DreamBathrooms
  • ngwrru68w68
  • modclub
  • magazineikmin
  • thenastyranch
  • rosin
  • khanakhh
  • InstantRegret
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • mdbf
  • GTA5RPClips
  • JUstTest
  • tacticalgear
  • normalnudes
  • tester
  • osvaldo12
  • everett
  • cubers
  • ethstaker
  • anitta
  • provamag3
  • Leos
  • cisconetworking
  • megavids
  • lostlight
  • All magazines