I am the PM working on Headless. Feel free to ask questions in this thread and I...

Bender · on Feb 19, 2023

There are many comments about potential abuse. I would be curious to know if your team have ever challenged each other to look like a real person accessing a site and the other part of the team tries to detect and block them? If there is anyone that could do this it would be the creators of Headless.

Why go through the exercise, one may ask? I believe it would be a critical thinking exercise to improve Headless even more while giving website maintainers a way to opt out of receiving traffic from it. If not your team, have you reached out to see if people from project zero would take on that challenge in their abundance of spare time? [1]

[1] - https://googleprojectzero.blogspot.com/

natorion · on Feb 19, 2023

We regularly get feature requests for Headless to provide a field or property that can be polled by JS frameworks to detect if Headless is active e.g. windows.isBot.

Well, Headless is open source, which means anybody could build a Headless version with such a property set to "I am a human, trust me!" and employ such a modified binary ... ;-)

Bender · on Feb 19, 2023

Oh absolutely, relying on a header would be a placebo at best. I was thinking more along the line of having two teams, one that develops Headless and another team at Google that try to defeat it non stop. An official game of cat and mouse. Project: Tom and Jerry? I guess legal would never buy into that name.

My own personal method for my silly hobby sites is just to put passwords on things with an auth prompt delay.

dmix · on Feb 19, 2023

Why should Google redteam their headless browser though? As other comments point out there's plenty of ways for bot detectors to id bots even with a browser which mirrors a normal one: https://news.ycombinator.com/item?id=34858056

Almost all of those are things are outside of the scope of the browser itself. And anyone doing serious bot attacks already have scripts/forks that modify these signals. I don't see how the chrome team could do much to help stop that at that level.

Bender · on Feb 19, 2023

In theory their blue team could come up with even more advanced puzzles that bots trip over and then open source and document the bot puzzles. I don't know that they would, incentives or lack thereof and all. If nothing else it might make their work day more fun.

Or if I put my evil corp hat on, the incentive could be that they make puzzles that only Headless can get around and all other bots become trivial to block and obsolete by even the least knowledgeable hobbyist. Perhaps Google release Nginx, Apache HTTPD, Apache Traffic Server, Envoy and HAProxy modules that only Headless can get around and all other bots internet-wide are entirely silenced. Chrome becomes the one and only bot to rule them all.

robertlagrant · on Feb 19, 2023

Why would they want to do that?

Bender · on Feb 19, 2023

Oh man, you're making me put that hat back on.

I suppose that Google going through that exercise would mean that they get market dominance on bot gathering data and anyone not using Chrome Headless would be unable to obtain freebie data. This could enable future features whatever that may be. readjusts hat One future feature could be auto-discovery of Google DNS and Google proxies in GCP so they can learn about new data sources through crowd-sourcing thus making their big-data sets more complete and their machine learning more powerful. Developers could block the proxies or compile them out but as we know most people are too lazy to do this and many won't care.

Another advantage would be that eventually the only bots abusing Google would be bots using their code and they would know how to detect and deal with as they would implement their own open source anti-bot modules in their web servers, load balancers, etc...

There are more obscure ideas but I am doffing the hat before the hat-wraiths sense it.

imglorp · on Feb 19, 2023

RFC for IPV4 evil bit.

https://www.rfc-editor.org/rfc/rfc3514

Bender · on Feb 19, 2023

You jest, but I could actually see this becoming a thing. I envision a future dystopian internet where people first have to authenticate their network gear, PC's, laptops, cell phones, cars, trucks, e-bikes, toasters, coffee makers to a government contracted service. Once authenticated they utilize something similar to that RFC but probably instead a nonce or jwt token tied to their device that gets embedded in the packet header somehow. Then sanctioning a continent, country, state, ISP, city, company, manufacturer, distributor or person would be simply disabling their evil bits so to speak.

The push for this is starting with adult content [1] but the goal posts could easily be mounted on train car with a very long and smooth train track that only goes downhill.

[1] - https://news.ycombinator.com/item?id=34726509

rektide · on Feb 21, 2023

There's a huge amount of aggro pissy shitthrowing that Chrome is facilitating automation in these threads. Bollocks.

You know what? The Internet Is For End Users [1]. If we're going to cite an RFC, it should be RFC 8890. Not having a better headless Chrome would be a violation of the most basic principles of the internet.

There are some cases where automation can get out of hand, but blocking these efforts should not come at user expense. So says the RFC8890, and a general collective belief/hum-in-the-room. The availability of a good browser like Chrome helping should not be an issue, given how many other ways bad players have to go too far & cause harm to sites. The people who have to deal with this are not the priority & this doesn't radically change their troubles; this radically helps end users wishing to exercise agency though.

In most cases being able to script & automate a site is a completely primitive user-agency, of no special regard. Headless Chrome being a somewhat tolerable way of doing that scripting is 100% morale, correct. It greatly assists us in fulfilling a primary & clear overarching purpose of the internet: to be for end users.

I wish I could say I cannot believe the complaining & whinining & snivelling, the pretentious-nonsense/acting-offended that Chrome would dare help make good automation. I wish I could say I don't think this crowd recognizes nor comprehends the basic purpose of the internet, but again, I think I know better; I suspect they do but their protests are disingenous, that they have allied their hearts with darker forces, against the user.

[1] https://www.rfc-editor.org/rfc/rfc8890

charcircuit · on Feb 20, 2023

>Headless is open source, which means anybody could build a Headless version with such a property set to "I am a human, trust me!"

This is flawed reasoning. Just because we can't eliminate abuse from headless browsers that doesn't mean we shouldn't work to reduce it. Finding such a modified binary or making it yourself is additional friction that will cause less of these bots to exist. Some people may not care if a website is able to block them or not or some people may not decided to do the work to read the robots.txt. By implementing these capabilites into the product by default you are making the web ecosystem a better place wit less abuse. You are right that someone could make a version without the antiabuse parts, but surely that fork will be less popular and less used.

Aeolun · on Feb 20, 2023

What about if I want the headless browser to look exactly the same? Why should we make a distinction between humans and machines?

sagebird · on Feb 22, 2023

If I run a soup kitchen, and Google is sending robots to my establishment which are indistinguishable from humans, I should I have the right to ask if the client is a robot.

I would hope that Google's robots would not be programmed to lie to me, but would be honest.

If robots are required to be honest, then I have a choice to serve them or not. If they are not honest, I do not have a choice.

charcircuit · on Feb 20, 2023

Then don't add code to your site to make it work different?

>Why should we make a distinction between humans and machines?

Because machines can be used to abuse a site at a scale that humans can't. Site owners want to protect their site against abuse.

Aeolun · on Feb 20, 2023

By modifying the browser. It feels like DRM by a different name to me.

charcircuit · on Feb 20, 2023

Okay? I don't care what you call it. It will reduce the amount of abuse in the world and that is a good thing.

sagebird · on Feb 22, 2023

While I appreciate your answer from a technical point of view - indeed it is trivial modify/spoof - there is an ethical dimension.

Should bots have the legal right to say they are human?

For example - if Google Inc is visiting a web page to collect information about it using a headless bowser, and the server asks - are you a bot - should Google be legally or ethically allowed to answer no? (declarations in headers could remove the need for question/answer chatter.)

(I want to pre-empt dismissing this line of questioning via 'what if Google wants to know how the site will be served to a human for better search results because google could include a specific header for that, eg "I am a bot, but request that you serve the version of this page served to humans". It would be up to the server to honor or reject that request.)

The defaults Google choose have compounding effects in our society. If you make it "normal" for bots to pretend to be human, the industry has minimal pressure to hold any standard above what you do, and better norms may never appear, or be delayed by a decade. The alternative is to be thoughtful today to try to create a better world.

paulirish · on Feb 19, 2023

https://github.com/paulirish/headless-cat-n-mouse was this basic idea, but open sourced.

runlevel1 · on Feb 20, 2023

The destination of that escalation is DRM.

mike_hearn · on Feb 19, 2023

Do you guys ever think about abusive automation at all, or do you just consider that other people's problem?

lupire · on Feb 19, 2023

Abusive how? Headed chrome can be automated, as can wget.

Its bizarre to ask a client side program to implement server-side controls for users you want to allow on your site but throttle.

parker_mountain · on Feb 19, 2023

Headed chrome adds a huge amount of overhead, and can also be fingerprinted more easily. This is a lot more declarative and makes it easier to run an abuse farm. Although, per my other comment, I don't see Headless as a tool that will particularly move the needle on abuse cases.

squeaky-clean · on Feb 20, 2023

Isn't headed chrome usually fingerprinted by variables inserted by the chromedriver? You can rename these variables and be undetectable (you don't even have to recompile chromedriver, you can use a hex editor or a perl replacement).

At least I've never gotten detected.

runlevel1 · on Feb 20, 2023

There are even Puppeteer plugins that will do it for you. [^1]

The best detection I've come across so far (i.e. before this release) has just required I run headless Chrome in headed mode. Granted, I don't do a ton of scraping -- mostly just pulling data out of websites so that I can play with it in aggregate using more civilized tools.

[1]: https://github.com/berstend/puppeteer-extra/tree/master/pack...

scotty79 · on Feb 19, 2023

You call it abuse. Other people might call it use.

mike_hearn · on Feb 19, 2023

I've not yet encountered anyone who doesn't consider spam to be a form of abuse.

account42 · on Feb 22, 2023

Spam can be an effective way around censorship. What is and isn't abuse often isn't as objective as some people want to pretend.

aabbcc11 · on Feb 20, 2023

I am that anyone you mentioned. For example, autoposting on 4chan works very well for me. I spam goods on 4chan to buy or create opinions that I force.

scotty79 · on Feb 19, 2023

[flagged]

dang · on Feb 20, 2023

Would you please stop posting in the flamewar style? We've had to ask you this in the past as well. It's not what this site is for, and destroys what it is for.

https://news.ycombinator.com/newsguidelines.html

scotty79 · on Feb 20, 2023

I'm sorry. I'll try to bite my tongue more often when I'm in combative mood. Thanks for putting up with me so far.

hackernewds · on Feb 19, 2023

You call it use. Other people might call it abuse.

scotty79 · on Feb 19, 2023

That's my point exactly.

_moof · on Feb 19, 2023

Are we just misquoting the Eurythmics now?

pdntspa · on Feb 19, 2023

The implications of your question are beyond dystopian

DangitBobby · on Feb 19, 2023

Please elaborate.

pdntspa · on Feb 19, 2023

Because it suggests adding usage controls, possibly enforced via cloud connectivity, to add restrictions that will inevitably make legitimate usage more difficult, frustrating, and most importantly, subject to outside control. Extend this far enough and the world starts to look like Doctorow's "Unauthorized Bread".

This is an awful world, one designed to reinforce class divide and protect the entrenched and the rich by deliberately handicapping easily-accessible tools, because of a few bad actors. It creates a world where the code for literally everything is the most hideously complex version of itself because it is riddled with constant checks, phone-homes, and arbitrary usage limits. It further pushes us towards a disempowering future where our computing is limited exclusively to appliance-like devices whos inner workings are controlled for it. It stands against the very principle of general-purpose computing.

robertlagrant · on Feb 19, 2023

That's not beyond dystopian. It's just dystopian.

And implications of a question aren't either. Just your imagined implications. Questions aren't bad.

supriyo-biswas · on Feb 19, 2023

See my comment[1] on this very thread.

[1] https://news.ycombinator.com/item?id=34858232

aabbcc11 · on Feb 20, 2023

If you are soy developer who thinks cloudflare is god that should solve problems for you and use O(n^2) or even worse algorithms in your code so you can't even optimize it, it is only your problem, correct.

In 2000 sites were running where code has been precisely made such way DDoS attack was impossible. Now it is heckin sauce of js malware obfuscated proprietary code.

If your site like this, you deserved it. Cloudflare and such companies just need your money for solving 5-minutes problem like AWF that is just a regex, and you have limits even for user agent filtering, lol.

Stop making shitcode and learn HTTP and TCP/IP theory, and you will make antispam filter that is 200% better than any cloudflare shit that is simply malware that runs cryptominer as a "IUAM" mode for their own benefit and you even pay for it.

parker_mountain · on Feb 19, 2023

For what it's worth, the large "players" already seem to have this capability. They've forced pretty much everyone to roll out captchas, waf-level throttling, proof of work interstitials, and behavior-based fingerprinting.

While my immediate response was the same as yours, I think this actually won't really change much in the way of bad actors.

It's unfortunate, but basic controls (such as throttling, etc) are pretty much a floor-required feature - one way to avoid this burden is to do things like use 3rd party idp (aka google login). I'm not happy with the state of things but I don't think headless will particularly contribute to a material increase in abuse cases.

nobu-mori · on Feb 19, 2023

Now that headless mode is a "real" Chromium instance, is it possible to add extension support to Chrome running in headless mode?

rektide · on Feb 21, 2023

I didn't know this was a restriction before! Interesting. I would have assumed old headless had a profile, that typical command-line efforts[1] would let one load extensions. Are we sure that your question is valid? Are we sure that previous headless Chrome didn't have profiles or couldn't load extensions? I'm not sure this question is valid. I think maybe the assumptions here are incorrect.

The new Chrome headless certainly purports to be "just Chrome" "without actually rendering." One of the notable differences in the new headless mode is that it at least shows the stock/built-in extensions. From the submission:

> Similarly, when it comes to plugins, the old headless Chrome used to return no plugins with navigator.plugins, which is a technique that used to be exploited for detection when Headless Chrome got released 6 years ago, cf this blog post. The new headless Chrome returns the same plugins as a headful Chrome, and that’s the same for the mimeTypes obtained with navigator.mimeTypes:

Maybe perhaps the new headless is faking it, but my impression is that extensions definitely work as normal in the new headless Chrome. How or whether they worked before is another very very interesting question I'd like answers to.

I do wish the AMA dev had actually replied to this. My hope is that this wasn't an issue before (but default plugins just weren't installed, and now they are, just to alter fingerprinting), and that now the situation is unchanged but default plugins are installed.

[1] https://stackoverflow.com/questions/16800696/how-install-crx...

nobu-mori · on Feb 21, 2023

https://bugs.chromium.org/p/chromium/issues/detail?id=706008

It looks like the new headless mode does support extensions.

skybrian · on Feb 19, 2023

Can you talk about your team's motivations for improving headless mode? Any particular use cases in mind?

natorion · on Feb 19, 2023

Here are two of them: -Test reproducibility -Automated configuration rollouts in enterprise environments

ccooffee · on Feb 19, 2023

Improving test environments is a huge upside. I haven't worked on browser automation in nearly a decade, but finding ways to work around shortcomings in the headless environment used to burn a lot of time on that team. I know of many small teams which made deliberate decisions NOT to do any browser automation tests (e.g. Selenium) because some issues required testing hooks in production code.

oh_sigh · on Feb 19, 2023

Is it too late to change the name from "new headless"? It won't be new forever, and then there will need to be a new new mode, or a differently named one that people think is older because it isn't the new mode.

dylan604 · on Feb 19, 2023

No, obviously, the next version will be called Newer Headless. Then you get the More Newer or Even Newer release. Or my personal favorite NewV2. /s

Using the word "new" in naming conventions is the most moronic and shortsighted way to name things in something that is quite obviously going to be changing in the somewhat near future.

robertlagrant · on Feb 19, 2023

New College is doing fine even with its name. It's just a name. Doesn't really matter.

dboreham · on Feb 19, 2023

Also New Forest.

oh_sigh · on Feb 19, 2023

It reminds me of "pont neuf"("new bridge" in French), which is the oldest bridge in Paris crossing the seine.

int_19h · on Feb 20, 2023

By all rights, it ought to be EvenLessHead. ~

plugin-baby · on Feb 19, 2023