Got a warning for my blog going over 100GB in bandwidth this month… which sounded incredibly unusual. My blog is text and a couple images and I haven’t posted anything to it in ages… like how would that even be possible?
Turns out it’s possible when you have crawlers going apeshit on your server. Am I even reading this right? 12,181 with 181 zeros at the end for ‘Unknown robot’? This is actually bonkers.
Edit: As Thunraz points out below, there’s a footnote that reads “Numbers after + are successful hits on ‘robots.txt’ files” and not scientific notation.
Edit 2: After doing more digging, the culprit is a post where I shared a few wallpapers for download. The bots have been downloading these wallpapers over and over, using 100GB of bandwidth usage in the first 12 days of November. That’s when my account was suspended for exceeding bandwidth (it’s an artificial limit I put on there awhile back and forgot about…) that’s also why the ‘last visit’ for all the bots is November 12th.


It could be, but they seem to get through Cloudflare’s JS. I don’t know if that’s because Cloudflare is failing to flag them for JS verification or if they specifically implement support for Cloudflare’s JS verification since it’s so prevalent. I think it’s probably due to an effective CPU time budget. For example, Google Bot (for search indexing) runs JS for a few seconds and then snapshots the page and indexes it in that snapshot state, so if your JS doesn’t load and run fast enough, you can get broken pages / missing data indexed. At least that’s how it used to work. Anyway, it could be that rather than a time cap, the crawlers have a CPU time cap and Anubis exceeds it whereas Cloudflare’s JS doesn’t – if they did use a cap, they probably set it high enough to bypass Cloudflare given Cloudflare’s popularity.