Anubis is awesome and I want to talk about it

SmokeyDope@piefed.social · edit-2 7 days ago

Anubis is awesome and I want to talk about it

daniskarma@lemmy.dbzer0.com · edit-2 7 days ago

I don’t think you have a usecase for Anubis.

Anubis is mainly aimed against bad AI scrappers and some ddos mitigation if you have a heavy service.

You are getting hit exactly the same, anubis doesn’t put up a block list or anything. It just put itself in front of the service. The load on your server and the risk you take it’s very similar anubis or not anubis here. Most bots are not AI scrappers they are just proving. So the hit on your server is the same.

What you want is to properly set up fail2ban or, even better, crowdsec. That would actually block and ban bots that try to prove your server.

If you are just self-hosting with Anubis the only thing you are doing is deriving the log noise towards Anubis logs and making your devices do a PoW every once in a while when you want to use your services.

Being honest I don’t know what you are self hosting. But at least it’s something that’s going to get ddos or AI scrapped, there’s not much point with Anubis.

Also Anubis is not a substitute for fail2ban or crowdsec. You need something to detect and ban brute force attacks. If not the attacker would only need to execute the anubis challenge get the token for the week and then they are free to attack your services as they like.

lazynooblet@lazysoci.al · 4 days ago

deleted by creator

quick_snail@feddit.nl · 7 days ago

Kinda sucks how it makes websites inaccessible to folks who have to disable JavaScript for security.

poVoq@slrpnk.net · 7 days ago

I kinda sucks how AI scrapers make websites inaccessible to everyone 🙄

Mwa@thelemmy.club · 7 days ago

and they dont respect robots.txt

El Barto@lemmy.world · 6 days ago

You are both right.

quick_snail@feddit.nl · 7 days ago

Not if the admin has a cache. It’s not a difficult problem for most websites

poVoq@slrpnk.net · 7 days ago

You clearly don’t know what you are talking about.

quick_snail@feddit.nl · 7 days ago

Lol I’m the sysadmin for many sites that doesn’t have these issues, so obviously I do…

It you’re the one that thinks you need this trash pow fronting for a static site, then clearly you’re the one who is ignorant

poVoq@slrpnk.net · 7 days ago

Obviously I don’t think you need Anubis for a static site. And if that is what your admin experience is limited too, than you have a strong case of dunning krueger.

quick_snail@feddit.nl · 4 days ago

99% of the pages that Anubis is fronting are static.

It’s an abuse if the tool that’s harming the internet.

poVoq@slrpnk.net · 4 days ago

Lol, wat? I have not seen Anubis even once in front of a static page. You are either making shit up or don’t understand what a static site is 🤦

WhyJiffie@sh.itjust.works · 7 days ago

there’s a fork that has non-js checks. I don’t remember the name but maybe that’s what should be made more known

quick_snail@feddit.nl · 7 days ago

Please share if you know.

The only way I know how to do this is running a Tor Onion Service, since the tor protocol has built-in pow support (without js)

WhyJiffie@sh.itjust.works · 7 days ago

It’s this one: https://git.gammaspectra.live/git/go-away

the project name is a bit unfortunate to show for users, maybe change that if you will use it.

some known privacy services use it too, including the invidious at nadeko.net, so you can check there how it works. It’s one of the most popular inv servers so I guess it cannot be bad, and they use multiple kinds of checks for each visitor

WhyJiffie@sh.itjust.works · 7 days ago

ps: I was wrong it’s not a fork, but a different thing doing the same and more

smh@slrpnk.net · 7 days ago

The creator is active on a professional slack I’m on and they’re lovely and receptive to user feedback. Their tool is very popular in the online archives/cultural heritage scene (we combine small budgets and juicy, juicy data).

My site has enabled js-free screening when the site load is low, under the theory that if the site load is too high then no one’s getting in anyway.

url@feddit.fr · 7 days ago

Honestly im not a big fan of anubis . it fucks users with slow devices

https://lock.cmpxchg8b.com/anubis.html

url@feddit.fr · 7 days ago

Did i forgot to mention it doesnt work without js that i keep disabled

sudoer777@lemmy.ml · 6 days ago

I host my main server on my own hardware, and a VPN on Hetzner because my shitty ISP doesn’t let me port forward. For the past year, bots were hitting my Forgejo instance hard. I forgot to disable registration and they generated hundreds of accounts with hundreds of repos with sketchy links, generating terrabytes of traffic from my VPS, costing me money in traffic. I disabled registration and deleted the spam, and bots still kept hitting my server for several months, which would cause memory leaks over time and crash it and consume CPU, and still costed me money with terrabytes of traffic per month. A few weeks ago, I put Anubis on the VPS. Now, zero bots hit my Forgejo instance and I don’t pay for their traffic anymore. Problem solved.

Jason2357@lemmy.ca · 6 days ago

Its always code forges and wikis that are effected by this because the scrapers spider down into every commit or edit in your entire history, then come back the next day and check every “page” again to see if any changed. Consider just blocking pages that are commit history at your reverse proxy.

LOLseas@lemmy.zip · 6 days ago

This is the first time I’ve ever seen it misspelled like that. It’s ‘terabyte/terabytes’. 1,024 GBs worth of data.

sudoer777@lemmy.ml · 6 days ago

Oops, although terabyte is 1000 GB, 1024 GiB is tebibyte

LOLseas@lemmy.zip · 5 days ago

Thanks friend. I only knew of the JEDEC terms, TIL.

WorldsDumbestMan@lemmy.today · 6 days ago

Nice ads people! Good job!

Helix 🧬@feddit.org · 6 days ago

So you think techaro paid them?

WorldsDumbestMan@lemmy.today · 6 days ago

No clue, but it sounds so ad like…

drkt@lemmy.dbzer0.com · 6 days ago

Stop playing wack-a-mole with these fucking people and build TARPITS!

Make it HURT to crawl your site illegitimately.

TerHu@lemmy.dbzer0.com · 7 days ago

yes, please be mindful when using cloudflare. with them you’re possibly inviting in a much much bigger problem

https://www.devever.net/~hl/cloudflare

quick_snail@feddit.nl · 7 days ago

Great article, but I disagree about WAFs.

Try to secure a nonprofits web infrastructure with as 1 IT guy and no budget for devs or security.

It would be nice if we could update servers constantly and patch unmaintained code, but sometimes you just need to front it with something that plugs those holes until you have the capacity to do updates.

But 100% the WAF should be run locally, not a MiTM from evil US corp in bed with DHS.

Deathray5@lemmynsfw.com · 6 days ago

Unrelated but one day I won’t get gender envy from random cartoon woman

Holytimes@sh.itjust.works · 6 days ago

At least you don’t have ear and tail evny it’s so fluffy

termaxima@slrpnk.net · 6 days ago

I am very annoyed that I have to enable cloudflare’s JavaScript on so many websites, I would much prefer if more of them used Anubis so I didn’t have third-party JavaScript running as often.

( coming from an annoying user who tries to enable the fewest things possible in NoScript )

Appoxo@lemmy.dbzer0.com · 7 days ago

Maybe you know the answer to my question:
If I’d want to use any app that doesnt run in a webbrowser (e.g. the native jellyfin app), how would that work? Does it still work then?

chaospatterns@lemmy.world · 6 days ago

If the app is just a WebView wrapper around the application, then the challenge page would load and try to be evaluated.

If it’s a native Android/iOS app, then it probably wouldn’t work because the app would try to make HTTP API calls and get back something unexpected.

Appoxo@lemmy.dbzer0.com · 6 days ago

Authelia already broke the functionality for jellyfin and symfonium.
So I guess the answer is no.

SmokeyDope@piefed.social · 7 days ago

It explicitly checks for web browser properties to apply challenges and all its challenges require basic web functionality like page refresh. Unless the connection to your server involves handling a user agents string it won’t work, I think this I how it is anyway. Hope this helped.

Appoxo@lemmy.dbzer0.com · 7 days ago

Assuming what you said is correct, it wouldnt help my use case.
Not hosting any page meant for public consumption anyway so it’s not really important.
But thanks for answering :)

turdas@suppo.fi · 7 days ago

Inspired by this post I spent a couple of hours today trying to set this up on my toy server, only to immediately run into what seems to be a bug where <video> tags loading a simple WebM video from right next to index.html broke because the media response got Anubis’s HTML bot check instead of media.

I suppose my use-case was just too complicated.

quick_snail@feddit.nl · 7 days ago

getting fail2ban to read caddy logs

You should look into wazuh

Victor@lemmy.world · 7 days ago

Seems like they already have a working solution now.

quick_snail@feddit.nl · 7 days ago

sure, but they have to maintain it.

Wazuh ships with rules that are maintained by wazuh. Less code rot.

Victor@lemmy.world · 7 days ago

That’s really good, could be worth looking into in that case. 👍 Thanks for following up!

sixty@sh.itjust.works · 7 days ago

Yeah im not gonna use this anime stuff

Mwa@thelemmy.club · 7 days ago

can be removed btw

ohshit604@sh.itjust.works · 7 days ago

Thought you had to pay for that with Anubis? Recently I’ve been eyeing Go Away as a potential alternative.

Mwa@thelemmy.club · 7 days ago

am not sure if you still need to pay for it

quick_snail@feddit.nl · 7 days ago

It’s amazing how few people here are familiar with caching

non_burglar@lemmy.world · 7 days ago

Anubis is an elegant solution to the ai bot scraper issue, I just wish the solution to everything wasn’t just spending compute everywhere. In a world where we need to rethink our energy consumption and generation, even on clients, this is a stupid use of computing power.

quick_snail@feddit.nl · 7 days ago

We have memory hard cryptographic functions

Leon@pawb.social · 7 days ago

It also doesn’t function without JavaScript. If you’re security or privacy conscious chances are not zero that you have JS disabled, in which case this presents a roadblock.

On the flip side of things, if you are a creator and you’d prefer to not make use of JS (there’s dozens of us) then forcing people to go through a JS “security check” feels kind of shit. The alternative is to just take the hammering, and that feels just as bad.

No hate on Anubis. Quite the opposite, really. It just sucks that we need it.

quick_snail@feddit.nl · 7 days ago

This is why we need these sites to have .onions. Tor Browser has a PoW that doesn’t require js

SmokeyDope@piefed.social · 7 days ago

Theres a compute option that doesnt require javascript. The responsibility lays on site owners to properly configure IMO, though you can make the argument its not default I guess.

https://anubis.techaro.lol/docs/admin/configuration/challenges/metarefresh

From docs on Meta Refresh Method

Meta Refresh (No JavaScript)

The metarefresh challenge sends a browser a much simpler challenge that makes it refresh the page after a set period of time. This enables clients to pass challenges without executing JavaScript.

To use it in your Anubis configuration:

# Generic catchall rule
- name: generic-browser
  user_agent_regex: >-
    Mozilla|Opera
  action: CHALLENGE
  challenge:
    difficulty: 1 # Number of seconds to wait before refreshing the page
    algorithm: metarefresh # Specify a non-JS challenge method

This is not enabled by default while this method is tested and its false positive rate is ascertained. Many modern scrapers use headless Google Chrome, so this will have a much higher false positive rate.

z3rOR0ne@lemmy.ml · 7 days ago

Yeah I actually use the noscript extension and i refuse to just whitelist certain sites unless I’m very certain I trust them.

I run into Anubis checks all the time and while I appreciate the software, having to consistently temporarily whitelist these sites does get cumbersome at times. I hope they make this noJS implementation the default soon.

Prathas@lemmy.zip · 7 days ago

Wait, you keep temporarily allowing then over and over again? Why temporary?

z3rOR0ne@lemmy.ml · 7 days ago

Most of the Anubis encounters I have are to redlib instances that are shuffled around, go down all the time, and generally are more ephemeral than other sites. Because I use another extension called Libredirect to shuffle which redlib instance I visit when clicking on a reddit link, I don’t bother whitelisting them permanently.

I already have solved this on my desktop by self hosting my own redlib instance via localhost and using libredirect to just point there, but on my phone I still do the whole nojs temp unblock random redlib instance. Eventually I plan on using wireguard to host a private redlib instance on a vps so I can just not deal with this.

This is a weird case I know, but its honestly not that bad.

Leon@pawb.social · 7 days ago

This is news to me! Thanks for enlightening me!

cecilkorik@piefed.ca · 7 days ago

if you are a creator and you’d prefer to not make use of JS (there’s dozens of us) then forcing people to go through a JS “security check” feels kind of shit. The alternative is to just take the hammering, and that feels just as bad.

I’m with you here. I come from an older time on the Internet. I’m not much of a creator, but I do have websites, and unlike many self-hosters I think, in the spirit of the internet, they should be open to the public as a matter of principle, not cowering away for my own private use behind some encrypted VPN. I want it to be shared. Sometimes that means taking a hammering. It’s fine. It’s nothing that’s going to end the world if it goes down or goes away, and I try not to make a habit of being so irritating that anyone would have much legitimate reason to target me.

I don’t like any of these sort of protections that put the burden onto legitimate users. I get that’s the reality we live in, but I reject that reality, and substitute my own. I understand that some people need to be able to block that sort of traffic to be able to limit and justify the very real costs of providing services for free on the Internet and Anubis does its job for that. But I’m not one of those people. It has yet to cost me a cent above what I have already decided to pay, and until it does, I have the freedom to adhere to my principles on this.

To paraphrase another great movie: Why should any legitimate user be inconvenienced when the bots are the ones who suck. I refuse to punish the wrong party.

Nate Cox@programming.dev · 7 days ago

I feel comfortable hating on Anubis for this. The compute cost per validation is vanishingly small to someone with the existing budget to run a cloud scraping farm, it’s just another cost of doing business.

The cost to actual users though, particularly to lower income segments who may not have compute power to spare, is annoyingly large. There are plenty of complaints out there about Anubis being painfully slow on old or underpowered devices.

Some of us do actually prefer to use the internet minus JS, too.

Plus the minor irritation of having anime catgirls suddenly be a part of my daily browsing.

url@feddit.fr · 7 days ago

Imagine friends seeing catgirl on your browser and now you have to explain it to them who has zero knowledge in it

bitcrafter@programming.dev · 7 days ago

What would you propose as an alternative?

Nate Cox@programming.dev · 7 days ago

There’s a caddy config out there that works as well as Anubis without the catgirls and mining: https://fxgn.dev/blog/anubis/

Axolotl@feddit.it · 7 days ago

Not having catgirls is def a con

rtxn@lemmy.world · 7 days ago

No numbers, no testimonials, or even anecdotes… “It works, trust me bro” is not exactly convincing.

poVoq@slrpnk.net · 7 days ago

That blog post is fundamentally misunderstanding what Anubis actually does.

cadekat@pawb.social · 7 days ago

Scarcity is what powers this type of challenge: you have to prove you spent a certain amount of electricity in exchange for access to the site, and because electricity isn’t free, this imposes a dollar cost on bots.

You could skip the detour through hashes/electricity and do something with a proof-of-stake cryptocurrency, and just pay for access. The site owner actually gets compensated instead of burning dead dinosaurs.

Obviously there are practical roadblocks to this today that a JavaScript proof-of-work challenge doesn’t face, but longer term…

artyom@piefed.social · 7 days ago

You could skip the detour through hashes/electricity and do something with a proof-of-stake cryptocurrency, and just pay for access. The site owner actually gets compensated instead of burning dead dinosaurs.

Maybe if the act of transferring crypto didn’t use a comparable or greater amount of energy…

cadekat@pawb.social · edit-2 7 days ago

That’s why I specified a proof-of-stake cryptocurrency. They use so much less energy that it is practically negligible in comparison, and more on the order of traditional online transactions.

daniskarma@lemmy.dbzer0.com · 6 days ago

I think the issue is that many sites are too aggressive with it. Anubis can be configured to only ask for challenges if the site is under unusual load, for instance when a botnet it’s actually ddosing the site. That’s when it shines.

Making it constantly ask for challenges when the service is not under attack is just a massive waste of energy. And many sites just enable it constantly because they can defer bot pings from their logs that way. That’s for instance what op is doing. It’s just a big misunderstanding of the tool.

Nate Cox@programming.dev · 7 days ago

The cost here only really impacts regular users, too. The type of users you actually want to block have budgets which easily allow for the compute needed anyways.

chicken@lemmy.dbzer0.com · 7 days ago

I think maybe they wouldn’t if they are trying to scale their operations to scanning through millions of sites and your site is just one of them

cadekat@pawb.social · 7 days ago

Yeah, exactly. A regular user isn’t going to notice an extra few cents on their electricity bill (boiling water costs more), but a data centre certainly will when you scale up.