Blocking AI crawlers with Caddy

I was reading the reddit thread on Claude AI crawlers effectively DDOSing Linux Mint forums libreddit.lunar.icu/…/claude_ai_name_and_shame/

and I wanted to block all ai crawlers from my selfhosted stuff.

I don’t trust crawlers to respect the Robots.txt but you can get one here: darkvisitors.com

Since I use Caddy as a Server, I generated a directive that blocks them based on their useragent. The content of the regex basically comes from darkvisitors.

Sidenote - there is a module for blocking crawlers as well, but it seemed overkill for me github.com/Xumeiquer/nobots

For anybody who is interested, here is the block_ai_crawlers.conf I wrote.


<span style="color:#323232;">(blockAiCrawlers) {
</span><span style="color:#323232;">  @blockAiCrawlers {
</span><span style="color:#323232;">    header_regexp User-Agent "(?i)(Bytespider|CCBot|Diffbot|FacebookBot|Google-Extended|GPTBot|omgili|anthropic-ai|Claude-Web|ClaudeBot|cohere-ai)"
</span><span style="color:#323232;">  }
</span><span style="color:#323232;">  handle @blockAiCrawlers {
</span><span style="color:#323232;">    abort
</span><span style="color:#323232;">  }
</span><span style="color:#323232;">}
</span><span style="color:#323232;">
</span><span style="color:#323232;"># Usage:
</span><span style="color:#323232;"># 1. Place this file next to your Caddyfile
</span><span style="color:#323232;"># 2. Edit your Caddyfile as in the example below
</span><span style="color:#323232;">#
</span><span style="color:#323232;"># ```
</span><span style="color:#323232;"># import block_ai_crawlers.conf
</span><span style="color:#323232;">#
</span><span style="color:#323232;"># www.mywebsite.com {
</span><span style="color:#323232;">#   import blockAiCrawlers
</span><span style="color:#323232;">#   reverse_proxy * localhost:3000
</span><span style="color:#323232;"># }
</span><span style="color:#323232;"># ```
</span>
LiveLM,

Huh, looks like the post in r/linux got removed for not being relevant.
What a joke.

hollyberries,

I’m a fan of hellpotting them.

Cyber,

Ooh, didn’t know about that one… thanks

Para_lyzed,

From your recommendation, I found a related project pandoras_pot that I am able to run in a Docker container, and seems to run more efficiently on my Pi home server. I now use it in my Caddyfile to redirect a number of fake subdomains and paths that are likely to be found by a malicious bot (of course all are excluded in my robots.txt for bots that actually respect it). Thanks for the recommendation!

arisunz,
@arisunz@lemmy.blahaj.zone avatar

I got meaner with them :3c

winnie,
@winnie@lemmy.ml avatar

Suggestion at the end:


<span style="color:#323232;">  <a class="boom" href="https://boom .arielaw.ar">hehe</a>
</span>

Wouldn’t it destroy GoogleBot (and other search engine) those making your site delisted from Search?

acockworkorange,

That’s devilishly and deliciously devious.

JustARegularNerd,

I just want you to know that was an amazing read, was actually thinking “It gets worse? Oh it does. Oh, IT GETS EVEN WORSE?”

arisunz,
@arisunz@lemmy.blahaj.zone avatar

lmao that means a lot, thanks <3

Deckweiss,

The nobots module I’ve linked bombs them

pvq,

I really like your site’s color scheme, fonts, and overall aesthetics. Very nice!

not_amm,

I agree, it’s readable and very cute!

jkrtn,

This is one of the best things I’ve ever read.

I’d love to see a robots.txt do a couple safe listings, then a zip bomb, then a safe listing. It would be fun to see how many log entries from an IP look like get a, get b, get zip bomb… no more requests.

ABasilPlant, (edited )

In dark mode, the anchor tags are difficult to read. They’re dark blue on a dark background. Perhaps consider something with a much higher contrast?

A picture of a website with a dark purple background and dark blue links.

Apart from that, nice idea - I’m going to deploy the zipbomb today!

arisunz,
@arisunz@lemmy.blahaj.zone avatar

nice catch, thanks (i use light mode most of the time)

Arghblarg,
@Arghblarg@lemmy.ca avatar

We should do more than block them, they need to be teergrubed.

winnie,
@winnie@lemmy.ml avatar

Your link has no article, and Video inside Flash file (swf) that itn’t opening in 2024.

And I don’t want to install Flash on my machine…

boredsquirrel,

Such a cool person making the video available for download

Deckweiss, (edited )

Thats an easy modification. Just redirect or reverse proxy to the tarpit instead of abort .

I was even thinking about an infinitely linked data-poisoned html document, but there seemed to be no ready made project that can generate one at the moment. (No published data-poisoning techniques for plain text at all afaik. But there is one for images.)

Ultimately I decided to just abort the connection as I don’t want my servers to waste traffic or CPU cycles.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • linux@lemmy.ml
  • DreamBathrooms
  • magazineikmin
  • everett
  • thenastyranch
  • Youngstown
  • slotface
  • hgfsjryuu7
  • ngwrru68w68
  • rosin
  • kavyap
  • khanakhh
  • PowerRangers
  • cubers
  • mdbf
  • Leos
  • InstantRegret
  • ethstaker
  • Durango
  • osvaldo12
  • tacticalgear
  • vwfavf
  • tester
  • GTA5RPClips
  • cisconetworking
  • modclub
  • normalnudes
  • anitta
  • provamag3
  • All magazines