Blocking AI crawlers with Caddy
I was reading the reddit thread on Claude AI crawlers effectively DDOSing Linux Mint forums libreddit.lunar.icu/…/claude_ai_name_and_shame/
and I wanted to block all ai crawlers from my selfhosted stuff.
I don’t trust crawlers to respect the Robots.txt but you can get one here: darkvisitors.com
Since I use Caddy as a Server, I generated a directive that blocks them based on their useragent. The content of the regex basically comes from darkvisitors.
Sidenote - there is a module for blocking crawlers as well, but it seemed overkill for me github.com/Xumeiquer/nobots
For anybody who is interested, here is the block_ai_crawlers.conf I wrote.
<span style="color:#323232;">(blockAiCrawlers) {
</span><span style="color:#323232;"> @blockAiCrawlers {
</span><span style="color:#323232;"> header_regexp User-Agent "(?i)(Bytespider|CCBot|Diffbot|FacebookBot|Google-Extended|GPTBot|omgili|anthropic-ai|Claude-Web|ClaudeBot|cohere-ai)"
</span><span style="color:#323232;"> }
</span><span style="color:#323232;"> handle @blockAiCrawlers {
</span><span style="color:#323232;"> abort
</span><span style="color:#323232;"> }
</span><span style="color:#323232;">}
</span><span style="color:#323232;">
</span><span style="color:#323232;"># Usage:
</span><span style="color:#323232;"># 1. Place this file next to your Caddyfile
</span><span style="color:#323232;"># 2. Edit your Caddyfile as in the example below
</span><span style="color:#323232;">#
</span><span style="color:#323232;"># ```
</span><span style="color:#323232;"># import block_ai_crawlers.conf
</span><span style="color:#323232;">#
</span><span style="color:#323232;"># www.mywebsite.com {
</span><span style="color:#323232;"># import blockAiCrawlers
</span><span style="color:#323232;"># reverse_proxy * localhost:3000
</span><span style="color:#323232;"># }
</span><span style="color:#323232;"># ```
</span>
Add comment