Make it someone else's problem; put a caching CDN in front of it, like Cloudflar...

supriyo-biswas · on Feb 19, 2023

I understand and agree with the suggestion of putting a CDN, but it's somewhat ironic to suggest the use of Cloudflare when that very same company is advocating for the DRM-for-webpages scheme.

sethhochberg · on Feb 19, 2023

Is it not a fair to assume that Cloudflare, as a company who have made a name for themselves selling various DDoS protection services, realize they're in an arms race with the old school way of handling these problems are are pursuing more advanced solutions before the current techniques are entirely useless?

It would be easy to point to the irony of saying "instead of supporting Cloudflare's proposals for PATs, use their CDN product for brute force protection" but on the other hand, they employ a lot of experts in this space and might see the writing on the wall in an increasingly adversarial public internet.

supriyo-biswas · on Feb 20, 2023

This is a good question, but if you look at it closely, Cloudflare seems to be the only company advocating for attestation schemes for the web.

It’s almost as if the conspiracy theory of Cloudflare acting as an arm of the US government and helping in the centralization of the internet is actually true.

notatoad · on Feb 20, 2023

is there such thing as a caching CDN that effectively protects against scrapers? generally if somebody is going to try and scrape a whole bunch of old infrequently-accessed but dynamically generated pages, most of those won't be in the cache and so the caching proxy isn't going to help at all.

i'm honestly asking, not just trying to disprove you. this is a real problem i have right now. ideally i'd get all my thousands of old, never-updated but dynamically generated pages moved over to some static host, but that's work and if i could just put some proxy in front to solve this for me i'd be pretty happy. but afaik, nothing actually solves this.

jamesfinlayson · on Feb 20, 2023

Akamai has a scraper filter (I think it just rate limits scrapers out of the box but can be configured to block if you want). I'm not sure how good it is at detecting what is a scraper and what isn't though.

notatoad · on Feb 20, 2023

Yeah, AWS has one of these, a set of firewall rules called "bot control". it seems to work well enough for blocking the well-behaved bots who request pages at a reasonable rate and self-identify with user-agent strings (which i'm not really concerned about blocking, but it does give me some nice graphs about their traffic). it seem doesn't do a whole lot to block an unknown scraper hitting pages as fast as it can.