That instruction is super overrated. It has next to no architectural cost - all it does is specify a constant set of rounding and overflow flags instead of using the current fpu state. The only real win is a code size reduction.
JavaScriptCore didn’t even use it when it was first made available because that operation is relatively uncommon and the cost is dominated by the fpu logic that precedes rounding - so it essentially does not matter for real world js. It may be possible to make a micro benchmark that it helps, though I doubt it.
I remember when Twitter was having a field day thinking that this instruction was the reason why the newest iPhones benchmarked well. There is no "magic instruction" that gets you a 40% increase in benchmarks. None. It would literally need to be multiple times faster and executed like half the time to get that increase, which is clearly not the case for a JavaScript floating point conversion. It took someone from the JavaScriptCore team replying with "we don't even emit that yet" for people to see sense.
> "The entire infotainment system is a HTML 5 super computer," Milton said.
> "That's the standard language for computer programmers around the world, so using it let's us build our own chips. And HTML 5 is very secure. Every component is linked on the data network, all speaking the same language. It's not a bunch of separate systems that somehow still manage to communicate."
It's a know phenomena. Whenever you read something written by a reporter (eg. someone who is not an expert), in an area that you yourself is an expert in, it will be weird or just plain wrong. Then on the next day you read an interesting article (written by a reporter, eg. non expert) on a subject you yourself have little experience with you happily accept it as the truth.
There are a lot of differences. AES will never change and will be recommended for decades. AES is very compute-intensive so a hardware implementation is much more efficient than software.
HTML/CSS/JS is ever-changing, control-intensive, and memory-intensive so it's probably not a good candidate for hardware acceleration.
DOM level 3 core became a recommendation in April 2004. DOM4 went last call in 2015. The fundamentals- I feel- are quite fixed, although many auxiliary systems do change.
latency to remote accelerators can be problematic for some control workloads. ideally the control plane can hopefully offload itself onto the accelerator too.
I don't see memory intensive as a barrier. The 8 vdom+ processors probably come with sizable multi-megabyte caches. Perhaps they could be early on-ram processor architectures? after all it seems they have a fixed function diff pipeline. icd also suggest that perhaps the hardware representation might be very effective at using low bit-depth encodings, saving gobs of memory. keep text offboard & encode attribute values via some columnar representation & this could be a high throughput HTMLElement slinger & differ!
So? We have ever-growing standards that we do hardware acceleration on. Some of the newer standards don't get hardware accel until new hardware comes out.
E.g. AV1, HVEC, H264, etc etc. All of these either have or are about to have hardware acceleration. Why not JS?
The ideal "accelerator" for these kinds of jobs is a CPU with a big cache.
Video encoding has well defined control loops and data paths that don't arbitrarily interfere with each other, so it's a good candidate for custom hardware.
That is, you have a high bandwidth, highly parallel fast-path between framebuffer memory and functional units that compute FFTs and do motion vector operations, and a control plane that looks at a small handful variables in order to decide which data plane operations to schedule and how to glue together the final result.
To run JS, you need a pile of functional units and lots of memory, and data for every operation needs to be able to come from / go to anywhere in memory. That's... just a general purpose computer.
All of those video standards are designed from day 1 to be hardware accelerated, they have a completely fixed pipeline with no control branches, with very few data dependencies (stuff like A+B=C C+A=D).
JS/CSS/HTML5 are not, they essentially have an open ended and infinite amount of branching and data dependency and I'm very skeptical a card could achieve much.
This is before we start talking about stuff like latency to the main CPU, CPUs are EXTREMELY fast in comparison to access to buses like ram, and especially PCI-E, I would not be the least bit surprised that even if some theoretical infinitely fast HTML5 accelerator card existed, it would still not be worth using due to latency of fetches from the card.
It's already not worth offloading things like cryptography to accelerator cards, and every major crypto algorithm was designed to run fast in hardware. And this is before we start talking about stuff like AES-NI.
Just for the sake of argument, I should point out that the bulk of the work done by JS/CSS/HTML involve primitive operations over a tree data structure. Conceptually, this paves the way to opportunities in hardware acceleration, similar to how the extensive use of polynomials in number crunching applications led to the addition of a fused multiply-add instruction set.
None of the compression methods you just listed will ever change. They are static, well defined, so making custom silicon to speed them up makes sense. Javascript is a massive mess of ever changing spaghetti (useful, delicious spaghetti, but still spaghetti), so custom silicon for it does not make sense.
The difference with Lisp machines is that they're the counterpart to Von Numen machines. C is probably the closest abstract representation of a Von Numen architecture.
Aside from all the great little jokes--there should definitely be a NIST standard Minion meme collection--the silliest part of this is probably that it's a dedicated PCI-Express card.
Well, that and the idea of permanently burning the, uh, unique design choices made by the web platform into hardware is a little horrifying.
Is that possible? Aren't there hardware video decoders that deliver, well, full video at high frame rates? I'm not sure how those things work. Maybe they have a direct route to the GPU and CPU just decides how things should get composited.
However most of them worked by using the video-overlay feature on cards where the hardware video decoder injected its output directly to the GPU's output (after the framebuffer) via an internal header - or even injected themselves into the GPU's output VGA signal using a D-Sub-input on the back of the card.
For a very brief time in the late-1990s there were partial MPEG-2 decoder cards that hooked themselves into DirectShow to do the bulk operations needed for DCT and/or Motion Compensation but not rendering the entire MPEG scene - they'd feed their results back to the CPU rather than the GPU... IIRC.
I remember a time where Windows Media Player would sometimes do video playback by just drawing a very specific "close-to-black" RGB color in it's window and then let the hardware impose the decoded video on to that part of the screen with that color. I could then open up MS Paint and draw my own custom shaped "window" into my video playback with that color by laying it on top of the WMP window.
Yes, that's Video Overlay: when the video is not rendered to the framebuffer but actually to the output signal directly. It was replaced by VMR (Video Mixing Renderer) which allows video to be rendered to each parent windows' offscreen DWM buffer.
The funny thing is that overlays are coming back (soon, I hope!) - not for performance reasons, but because using a compositing window manager like the DWM introduces an additional frame of latency, but if a foreground window is being displayed 1:1 on the desktop then the GPU can simply overlay it directly to the output signal and thus eliminate that frame of latency. Some Linux WMs support it already, and Microsoft said they're working on it.
It depends how often you have to round-trip between the CPU and the accelerator. If it's never or once per frame (as in video decoding) it's not bad, but if the JavaScript core ended up blocking on measurements from the layout core multiple times per frame you could end up losing a lot of performance.
The idea is quite realistic. (Apart from the GCs) But given how html is changing, the behaviour would become obsolete soon.
It could be great though for smaller parts. If someone can make a super fast font renderer accelerator, it could help in general. Alternatively we could adopt the GPU accelerated one created for servo.
You can update FPGAs over-the-air, the are reconfigurable silicon. You already have this for your CPUs whenever you install an "Intel microcode update".
This is JavaScript adjacent, so I think you can be forgiven for thinking this crazy waste of space might actually be a real thing. I mean, some of the most popular and useful projects in JavaScript started out as such, so there's a long tradition. ;)
I hate to break it to you, but you’re probably using one of these to read this comment.
If you don’t believe me, disable hardware acceleration on your machine (force the video card to VESA or framebuffer mode or something), and try to read the news, use web apps, etc. Compare them to native 2D apps, which should still generally work just fine.
In my experience, a modern, headless 24 core xeon with 128Gb of ram and 2x10GBit nics can’t even use jenkins and jira comfortably at the same time.
News sites with JavaScript enabled and no ad blocker are just not usable at all on such a machine.
I'm not sure whether this is sarcasm or not. This comment (and most other web pages) work fine without any hardware acceleration enabled. Test software: firefox on windows guest os, with 3d acceleration disabled in virtual machine settings. AFAIK browsers in the early 2000s didn't even have gpu acceleration for rendering.
This is not sarcasm. I've encountered news sites that do over 15,000 requests and load chains and waterfalls of tracking, invasive garbage, stuff that is illegal in Europe, from over 250+ domains on a single pageview.
> I've encountered news sites that do over 15,000 requests
I'm not sure if you're joking or not, but in the event you were being serious I should point out that the performance impact of doing a lot of requests is not due to the CPU but time wasted while waiting for the request to arrive.
You'd be hard pressed to find a hardware-based strategy for the client-side that would make servers send their replies faster.
And while the client waits for a reply, their CPU just idles.
I'm not joking, but the real problem here is all the hot steaming dogshite that they choose to load, not really the number of requests, IMO.
Many of those requests are also running GPU code to fingerprint clients. Canvas and WebGL. To "prevent clearing cookies to bypass paywall" fraud and ban scrapers.
> I'm not joking, but the real problem here is all the hot steaming dogshite that they choose to load, not really the number of requests, IMO.
Neither making requests not transferring data around are CPU-bound activities.
> Many of those requests are also running GPU code to fingerprint clients. Canvas and WebGL. To "prevent clearing cookies to bypass paywall" fraud and ban scrapers.
Even taking these statements at face value, considering that javascript is single-threaded by default it still sounds like a perceptual performance problem (not real performance problem) resulting from a poor software architecture. Expensive tasks are expected to be offloaded out of the main thread to avoid getting it blocked.
> The XML Accelerator XA35 is a highly efficient XML processing engine that makes use of purpose-built features such as optimized caches and dedicates SSL hardware to process XML at near wire-speed.
> The appliance can be used inline in the network topology, not as a coprocessor that hangs off a particular server. A popular use for the appliance is to receive XML responses from servers and transform them into HTML before forwarding the response to the client.
This just gives me a "Wordfence" 503 for no apparent reason. Here's an archive.org link for the benefit of anyone else who might want to read the product information despite the site owner's arbitrary restrictions: http://web.archive.org/web/20180829042526/http://soasecure.c...
The problem I run into sometimes, is when I use only HTML or CSS or native javascript to accomplish something. Someone will always ask "Why aren't you using the <library's code> here?" Answer: "Because it's easier and/or more efficient to accomplish the same thing doing this instead.
I get it, sometimes you want to keep the syntax the same for future maintainability, while other times Ij just want to use something I already know that I know will work just as well.
It should also add a few dozen GB of ram for all those chrome tabs.
I can't tell if that card would be a great thing or a terrible thing if it really existed. On the one hand, encoding decisions in hardware might slow down the pace at which the web shifts around. On the other hand, web page complexity would expand to fill the available processing power, so they would only be fast for the web devs and anyone else who has the expansion card.
Getting data to and from the FPGA, and the workload being significantly pointer-chasing based, would probably nullify any advantage an FPGA could have.
Firefox uses the GPU via WebRender[0]. I don't know what specific things it's used for, but I'm pretty sure it's not any of the DOM parsing. The GPU is used for the actual graphical rendering down the line -- compositing and such[1].
Apple's Mac Pro, AirPods Pro or iPhone 12's webpages, Facebook Feed, etc etc....
High Quality Image with Animation and Jank Free Scrolling is still not done right ( or not even possible ) by any major tech companies in 2020. And that is just Web Pages, not even Web Apps.
And preferably doing so without my Quad Core MacBook Pro with GPU accelerated Browser ever warming up my lap.
they're called chrome casts, independent html/browser hardware systems.
you'd need a bunch of those $15 hdmi-in (to usb) adapters to use the chrome casts accelerators but the idea is very much there; hardware that does the web.
Or maybe we could just come up with a way to test and catalog web pages that are actually fast and not bloated. Like a browser extension with a test suite and database or something.
Maybe we need a mainstream search engine that would actually put their users’ interests first and heavily penalize websites that are bloated/have ads/autoplay video/etc?
I feel like I'm crazy based on how much I loathe electron apps.
Don't get me wrong it's a great concept, I like the idea of portability everywhere, but I can't get past the fact it's basically just a stripped-down Chrome browser with the "app" effectively being plain-ole HTML/CSS/JS. It just seems pointless when you can run the exact software in the web browser you likely already have running, with less overhead to boot.
The music streaming service Tidal really highlighted some of these issues for me, and started my hate-train. Their desktop application is electron-based, which supports HiFi, as does Chrome. The crazy thing is Chrome is the only browser that supports HiFi, and has been this way since the service launched in 2014, despite countless requests from FF users to add support. If Tidal is going to spend 6 years ignoring everything but Chrome for the sake of their electron app, IMO other companies are going to follow suit and continue the march towards Internet Explorer 2.0
I totally understand. Electron is cancer. I don't want a "native app" to come in a 80~150MB executable, bundled with, not only all its dependencies, but a full fucking chrome engine.
Electron is cancer. I hate it and I'm not backing down off that.
> It just seems pointless when you can run the exact software in the web browser
This isn’t necessarily true though. Electron gives you file system access (among other things). Most electron apps I’ve used at least do something that a browser cannot do. Though definitely not all.
Also, being able to alt/cmd + tab to they application you want is often convenient.
It's not so much the tech but yes, having to ship an entire browser for each app... This isn't sustainable, as many cheap devices today are sold with only 64/128 GB SSD...
One alternative is PWA but no interaction with the OS since sandboxed and of course, different platforms == different browsers which more or less support for PWA, so not fit for a Git desktop client for instance.
however, they're messing up by making it a PCI-e card. They need to make it USB-C or with a lightning connector so that people can use it on laptops or mobile devices. No need to upgrade your phone when you can plug in this accelerator!
I work on Firefox and I cannot believe how seriously people are discussing this. At its core, people outrageously under-estimate how complex browsers are. Nevermind "how fast the web is moving", but even Servo cannot reliably render the wide web correctly, so good luck even writing a clean-slate implementation matching a snapshot of web specs taken today!
"It's one browser, Michael. What could it cost? 10 engineers?"
A great deal of complexity is due to technical debt, notably HTML/CSS technical debt because "don't break the web". Well maybe we should "break the web" once to allow new browser engines to be easier to craft, after all, the web started rather simple 30 years ago...
It’s not necessary. CSS 2.1 makes up most of the modern web without other modules and it’s not too hard to implement. It’s just that no one writes compositors.
Partial implementation can exist as valid user agents, and most developers don’t even seem to realize you can have a compliant browser that doesn’t render to CSS 2.1 box model specifications.
The thing is, if you can make it run the top 10 websites (google, facebook, youtube et all) and convince the maintainers of those websites to always test on this thing before each release that would already bring a giant win for most users in the planet, and websites that have features outside the scope of this thing would fallback to standard rendering; in such world (and if it gains momentum) developers would slowly move to avoid features that this doesn't support in order to get the performance benefits it brings.
The sentiment isn't, "hardware acceleration presents genuine technical advantages for HTML and CSS", it's "the web is so slow and bloated it feels like playing Crysis without a video card."
Firefox aside, I think anyone who took that joke as a serious product idea needs to brush-up on how computers work, how software works, and wha good engineering looks like.
This is satire, but parallel HTML rendering has untapped potential for both speedup and power savings (as you distribute single-core workload over a larger number of lower-voltage cores).
I wonder if this could be one of ARM Macbook's killer features: a web browser with a Servo-like parallel engine, written for custom silicon with a number of small-ish, low-frequency ARM cores. Or would that be excessive?
Probably not worth it to add dedicated cores for web processing, but iPhone already takes advantage of special ARM instructions for faster JavaScript execution [0]. That will almost certain be used on the new ARM Macs.
That is not true. As it turns out, Safari didn't use that instruction [1] at the time and even when it did. 99% of those performance gain in Speedometer had nothing to do what that instruction .
I guess not really your fault. We have KOLs that put out information on their site and channels ( in this case DaringFireball and Twitter ) and they, for whatever reason never really correct themselves. ( Even when they knew they were wrong )
[0]: https://stackoverflow.com/questions/50966676/why-do-arm-chip...