Long story short, my VPS, which I’m forwarding my servers through Tailscale to, got hammered by thousands of requests per minute from Anthropic’s Claude AI. All of which being from different AWS IPs.

The VPS has a 1TB monthly cap, but it’s still kinda shitty to have huge spikes like the 13GB in just a couple of minutes today.

How do you deal with something like this?
I’m only really running a caddy reverse proxy on the VPS which forwards my home server’s services through Tailscale. "

I’d really like to avoid solutions like Cloudflare, since they f over CGNAT users very frequently and all that. Don’t think a WAF would help with this at all(?), but rate limiting on the reverse proxy might work.

(VPS has fail2ban and I’m using /etc/hosts.deny for manual blocking. There’s a WIP website on my root domain with robots.txt that should be denying AWS bots as well…)

I’m still learning and would really appreciate any suggestions.

  • mholiv@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    5 hours ago

    Fair. But I haven’t seen any anti-ai-scraper tarpits that do that. The ones I’ve seen mostly just pipe 10MB of /dev/urandom out there.

    Also I assume that the programmers working at ai companies are not literally mentally deficient. They certainly would add .timeout(10) or whatever to their scrapers. They probably have something more dynamic than that.

    • sem@lemmy.blahaj.zone
      link
      fedilink
      English
      arrow-up
      4
      ·
      2 hours ago

      There’s one I saw that gave the bot a long circular form to fill out or something, I can’t exactly remember

    • Ah, that’s where tuning comes in. Look at the logs, take the average time-out, and tune the tarpit to return a minimum payload consisting of a minimal HTML containing a single, slightly different URL back to the tar pit. Or, better yet, JavaScript that loads a single page of tarpit URLs very slowly. Bots have to be able to run JS, or else they’re missing half the content on the web. I’m sure someone has created a JS forkbomb.

      Variety is the spice of life. AI botnet blacklists are probably the better solution for web content; you can run ssh on a different port and run a tarpit on the standard port, and it will barely affect you. But for the web, if you’re running a web server you probably want visitors, and tarpits would be harder to set up to catch only bots.

      • mholiv@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 hour ago

        I see your point but like I think you underestimate the skill of coders. You make sure your timeout is inclusive of JavaScript run times. Maybe set a memory limit too. Like imagine you wanted to scrape the internet. You could solve all these tarpits. Any capable coder could. Now imagine a team of 20 of the best coders money can buy each paid 500.000€. They can certainly do the same.

        Like I see the appeal of running a tar pit. But like I don’t see how they can “trap” anyone but script kiddies.