Ever since Mv3 came into enforcement I’ve been using a local DNS blocklist in /etc/hosts (UHB more specifically) for locking the browser down as much as possible. Unfortunately this has lead to some major issues when browsing, i.e. 5-10 second latency for every single request that goes through the browser. Can’t completely stop using some Chromium-browser since I need to test my work on the browser at some point.

I’m suspecting it’s due to the browser waiting for some telemetry endpoint, or trying to get around the block through some other means (which won’t work since outgoing DNS via anything else but the gateway is blocked in the firewall), and giving up after a specified time. At this point I’ve narrowed the issue down to the full version of UHB, as when toggling this off the requests no longer hang before going through. Firefox doesn’t suffer from the same issues – every Chromium-derived platform suffers, though, including Electron applications like VSCode. Toggling async DNS off hasn’t helped (which previously supposedly has helped some), neither has turning secure DNS (read Google’s system DNS sinkhole workaround) off.

Out of curiosity, has anyone else encountered the same issue or is using a version of Chromium that’s not suffering from the same issues? This is getting a bit infuriating, and though I’ve already moved my browsing on Firefox, it’s still bothersome to run e.g. UI tests when every fetch operation takes 10 s. This even happens when connecting to stuff running on localhost or LAN addresses.

  • Xanza@lemm.ee
    link
    fedilink
    English
    arrow-up
    11
    arrow-down
    2
    ·
    7 months ago

    5-10 second latency for every single request

    I mean, yeah? This isn’t a bug, this is just the consequence of how you have it setup. You’re telling your browser to check this file with (likely) 100,000+ entries in it on each page load. If this is something you’d like to do, then you should be running AdGuard Home or PiHole. Using a hosts file directly is a really bad idea.

    • ReversalHatchery@beehaw.org
      link
      fedilink
      English
      arrow-up
      10
      arrow-down
      1
      ·
      7 months ago

      unless they use a computer from the 80’s, there’s no reason a large hosts file should slow down programs that bad.

      yeah. this is a bug.

      • antimidas@sopuli.xyzOP
        link
        fedilink
        English
        arrow-up
        3
        ·
        edit-2
        7 months ago

        Yep, precisely.

        It’s also quite literally one of the recommended methods of installation for e.g. UHB, for which there’s even a pre-made script in the repo.

        Edit: Also, Chromium devs are aware of this use case and have even added optimizations for it in the past, as visible in the highlighted comment. And the max hosts file size defaults to 32 MiB which is well over the size I’m using (24 MiB). Makes it even weirder for it to bog down completely when experimenting with a ~250 MiB hosts file, as it should just reject it outright according to implementation.

    • antimidas@sopuli.xyzOP
      link
      fedilink
      English
      arrow-up
      5
      ·
      edit-2
      7 months ago

      TLDR: looks like you’re right, although Chrome shouldn’t be struggling with that amount of hosts to chug through. This ended up being an interesting rabbit hole.

      My home network already uses unbound with proper blocklist configured, but I can’t use the same setup directly with my work computer as the VPN sets it’s own DNS. I can only override this with a local resolver on the work laptop, and I’d really like to get by with just systemd-resolved instead of having to add dnsmasq or similar for this. None of the other tools I use struggle with this setup, as they use the system IP stack.

      Might well be that chromium has a bit more sophisticated a network stack (than just using the system provided libraries), and I remember the docs indicating something about that being the case. In any way, it’s not like the code is (or should be) paging through the whole file every time there’s a query – either it forwards it to another resolver, or does it locally, but in any case there will be a cache. That cache will then end up being those queried domains in order of access, after which having a long /etc/hosts won’t matter. Worst case scenario after paging in the hosts file initially is 3-5 ms (per query) for comparing through the 100k-700k lines before hitting a wall, and that only needs to happen once regardless of where the actual resolving takes place. At a glance chrome net stack should cache queries into the hosts file as well. So at the very least it doesn’t really make sense for it to struggle for 5-10 seconds on every consecutive refresh of the page with a warm DNS cache in memory…

      …or that’s how it should happen. Your comment inspired me to test it a bit more, and lo: after trying out a hosts file with 10 000 000 bogus entries chrome was brought completely to it’s knees. However, that amount of string comparisons is absolutely nothing in practice – Python with its measly linked lists and slow interpreter manages comparing against every row in 300 ms, a crude C implementation manages it in 23 ms (approx. 2 ms with 1 million rows, both a lot more than what I have appended to the hosts file). So the file being long should have nothing to do with it unless there’s something very wrong with the implementation. Comparing against /etc/hosts should be cheap as it doesn’t support wildcard entires – as such the comparisons are just simple 1:1 check against first matching row. I’ll continue investigating and see if there’s a quick change to be made in how the hosts are read in. Fixing this shouldn’t cause any issues for other use cases from what I see.

      For reference, if you want to check the performance for 10 million comparisons on your own hardware:

      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      #include <sys/time.h>
      
      
      int main(void) {
      	struct timeval start_t;
      	struct timeval end_t;
      
      	char **strs = malloc(sizeof(char *) * 10000000);
      	for (int i = 0; i < 10000000; i++) {
      		char *urlbuf = malloc(sizeof(char) * 50);
      		sprintf(urlbuf, "%d.bogus.local", i);
      		strs[i] = urlbuf;
      	}
      
      	printf("Checking comparisons through array of 10M strings.\n");
      	gettimeofday(&start_t, NULL);
      
      	for (int i = 0; i < 10000000; i++) {
      		strcmp(strs[i], "test.url.local");
      	}
      
      	gettimeofday(&end_t, NULL);
      
      	long duration = (end_t.tv_usec - start_t.tv_usec) / 1000;
      	printf("Spent %ld ms on the operation.\n", duration);
      
      	for (int i = 0; i < 10000000; i++) {
      		free(strs[i]);
      	}
      	free(strs);
      }
      
      • pHr34kY@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        7 months ago

        I would have assumed the hosts file got cached, indexed and re-read if the file changes. Surely it’s not read and parsed for every single hostname lookup.

        My adblock list is in BIND9 anyway, so I don’t get this issue. I can see it definitely takes a second or two to parse the whole list on startup.

        • antimidas@sopuli.xyzOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          7 months ago

          Don’t seem to be any disk reads on request at a glance, though that might just be due to read caching on OS level. There’s a spike on first page refresh/load after dropping the read cache, so that could indicate reading the file in every time there’s a fresh page load. Would have to open the browser with call tracing to be sure, which I’ll probably try out later today.

          For my other devices I use unbound hosted on the router, so this is the first time encountering said issue for me as well.

      • Xanza@lemm.ee
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        6
        ·
        7 months ago

        Chrome shouldn’t be struggling with that amount of hosts to chug through.

        You’re using software to do something it wasn’t designed to do. So this comment is beyond meaningless. There’s no value whatsoever in it.

        My home network already uses unbound with proper blocklist configured

        So then why would you even think to do something like this? Like…why?

        • ReversalHatchery@beehaw.org
          link
          fedilink
          English
          arrow-up
          6
          arrow-down
          1
          ·
          7 months ago

          So then why would you even think to do something like this? Like…why?

          well if you would bother to read what they have written… oh I see, then you couldn’t be so condescending

        • antimidas@sopuli.xyzOP
          link
          fedilink
          arrow-up
          5
          ·
          7 months ago

          You’re using software to do something it wasn’t designed to do

          As such, Chrome isn’t exactly following the best practices either – if you want to reinvent the wheel at least improve upon the original instead of making it run worse. True, it’s not the intended method of use, but resource-wise it shouldn’t cause issues – at this point one would’ve needed active work to make it run this poorly.

          Why would you even think to do something like this?

          As I said, due to company VPN enforcing their own DNS for intranet resources etc. Technically I could override it with a single rule in configuration, but this would also technically be a breach of guidelines as opposed to the more moderate rules-lawyery approach I attempt here.

          If it was up to me the employer should just add some blocklist to their own forwarder for the benefit of everyone working there…

          But guess I’ll settle for local dnsmasq on the laptop for now. Thanks for the discussion 👌🏼