I understand why OpenAI is trying to reduce its costs, but it simply isn't true ...

zer00eyz · 2026-03-30T04:42:21 1774845741

> but it simply isn't true that AI crawlers aren't creating very significant load.

And how much of this is users who are tired of walled gardens and enshitfication. We murdered RSS, API's and the "open web" in the name of profit, and lock in.

There is a path where "AI" turns into an ouroboros, tech eating itself, before being scaled down to run on end user devices.

stingraycharles · 2026-03-30T02:41:36 1774838496

These are ChatGPT and Claude Desktop crawlers we’re talking about? Or what is it exactly? Are these really creating significant load while not honoring robots.txt?

Genuinely interested.

63stack · 2026-03-30T09:06:56 1774861616

Is this the first time you are reading HN? Every day there are posts from people describing how AI crawlers are hammering their sites, with no end. Filtering user agents doesn't work because they spoof it, filtering IPs doesn't work because they use residential IPs. Robots.txt is a summer child's dream.

miki123211 · 2026-03-30T08:48:40 1774860520

They seem to mostly be third-party upstarts with too much money to burn, willing to do what it takes to get data, probably in hopes of later selling it to big labs. Maaaybe Chinese AI labs too, I wouldn't put it past them.

OpenAI et al seem to mostly be well-behaved.

cruffle_duffle · 2026-03-30T02:54:55 1774839295

I bet dollars to doughnuts that 95% of the traffic is from Claude and ChatGPT desktop / mobile and not literal content scraping for training.

crote · 2026-03-30T04:10:46 1774843846

That wouldn't explain the 1000x increase in traffic for extremely obscure content, or seeing it download every single page on a classic web forum.

duttish · 2026-03-30T06:52:19 1774853539

And doing it over, and over, and over and over again. Because sure it didn't change in the last 8 years but maybe it's changed since yesterdays scrape?