Hacker Newsnew | past | comments | ask | show | jobs | submit | mikequinlan's commentslogin

As TFA says, on x86 `sub eax, eax` encodes to the same number of bytes and executes in the same number of cycles.

On modern ones, x86 has quite a history and the idiom might carry on from an even older machine.

Edit: Looked at comments, seems like x86 and the major 8bit cpu's had the same speed, pondering in this might be a remnant from the 4-bit ALU times.


> seems like x86 and the major 8bit cpu's had the same speed, pondering in this might be a remnant from the 4-bit ALU times.

I think that era of CPUs used a single circuit capable of doing add, sub, xor etc. They'd have 8 of them and the signals propagate through them in a row. I think this page explains the situation on the 6502: https://c74project.com/card-b-alu-cu/

And this one for the ARM 1: https://daveshacks.blogspot.com/2015/12/inside-alu-of-armv1-...

But I'm a software engineer speculating about how hardware works. You might want to ask a hardware engineer instead.


Nope.

In any ALU the speed is determined by the slowest operation, so XOR is never faster. It does not matter which is the width of the ALU, all that matters is that an ALU does many kinds of operations, including XOR and subtraction, where the operation done by an ALU is selected by some control bits.

I have explained in another comment that the only CPUs where XOR can be faster than subtraction are the so-called superpipelined CPUs. Superpipelined CPUs have been made only after 1990 and there were very few such CPUs. Even if in superpipelined CPUs it is possible for XOR to be faster than subtraction, it is very unlikely that this feature has been implemented in anyone of the few superpipelined CPU models that have ever been made, because it would not have been worthwhile.

For general-purpose computers, there have never been "4-bit ALU times".

The first monolithic general-purpose processor was Intel 8008 (i.e. the monolithic version of Datapoint 2200), with an 8-bit ISA.

Intel claims that Intel 4004 was the first "microprocessor" (in order to move its priority earlier by one year), but that was not a processor for a general-purpose computer, but a calculator IC. Its only historical relevance for the history of personal computers is that the Intel team which designed 4004 gained a lot of experience with it and they established a logic design methodology with PMOS transistors, which they used for designing the Intel 8008 processor.

Intel 4004, its successors and similar 4-bit processors introduced later by Rockwell, TI and others, were suitable only for calculators or for industrial controllers, never for general-purpose computers.

The first computers with monolithic processors, a.k.a. microcomputers, used 8-bit processors, and then 16-bit processors, and so on.

For cost reduction, it is possible for an 8-bit ISA to use a 4-bit ALU or even just a serial 1-bit ALU, but this is transparent for the programmer and for general-purpose computers there never were 4-bit instruction sets.


> In any ALU the speed is determined by the slowest operation, so XOR is never faster.

On a 386, a reg/reg ADD is 2 cycles. An r32 IMUL is "9-38" cycles.

If what you stated were true, you'd be locking XOR's speed to that of DIV. (Or you do not consider MUL/DIV "arithmetic", or something.)

https://www2.math.uni-wuppertal.de/~fpf/Uebungen/GdR-SS02/op...

> I have explained in another comment that the only CPUs where XOR can be faster than subtraction are the so-called superpipelined CPUs. Superpipelined CPUs have been made only after 1990 and there were very few such CPUs.

(And I'm choosing 386 to avoid it being "a superpipelined CPU".)


> Or you do not consider MUL/DIV "arithmetic", or something.

Multiplier and divider are usually not considered part of the ALU, yes. Not uncommon for those to be shared between execution threads while there's an ALU for each.


386 is a microprogrammed CPU where a multiplication is dome by a long sequence of microinstructions, including a loop that is executed a variable number of times, hence its long and variable execution time.

A register-register operation required 2 microinstructions, presumably for an ALU operation and for writing back into the register file.

Unlike the later 80486 which had execution pipelines that allowed consecutive ALU operations to be executed back-to-back, so the throughput was 1 ALU operation per clock cycle, in 80386 there was only some pipelining of the overall instruction execution, i.e. instruction fetching and decoding was overlapped with microinstruction execution, but there was no pipelining at a lower level, so it was not possible to execute ALU operations back to back. The fastest instructions required 2 clock cycles and most instructions required more clock cycles.

In 80386, the ALU itself required the same 1 clock cycle for executing either XOR or SUB, but in order to complete 1 instruction the minimum time was 2 clock cycles.

Moreover, this time of 2 clock cycles was optimistic, it assumed that the processor had succeeded to fetch and decode the instruction before the previous instruction was completed. This was not always true, so a XOR or a SUB could randomly require more than 2 clock cycles, when it needed to finish instruction decoding or fetching before doing the ALU operation.

In very old or very cheap processors there are no dedicated multipliers and dividers, so a multiplication or division is done by a sequence of ALU operations. In any high performance processor, multiplications are done by dedicated multipliers and there are also dedicated division/square root devices with their own sequencers. The dividers may share some circuits with the multipliers, or not. When the dividers share some circuits with the multipliers, divisions and multiplications cannot be done concurrently.

In many CPUs, the dedicated multipliers may share some surrounding circuits with an ALU, i.e. they may be connected to the same buses and they may be fed by the same scheduler port, so while a multiplication is executed the associated ALU cannot be used. Nevertheless the core multiplier and ALU remain distinct, because a multiplier and an ALU have very distinct structures. An ALU is built around an adder by adding a lot of control gates that allow the execution of related arithmetic operations, e.g. subtraction/comparison/increment/decrement and of bitwise operations. In cheaper CPUs the ALU can also do shifts and rotations, while in more performant CPUs there may be a dedicated shifter separated from the ALU.

The term ALU can be used with 2 different senses. The strict sense is that an ALU is a digital adder augmented with control gates that allow the selection of any operation from a small set, typically of 8 or 16 or 32 operations, which are simple arithmetic or bitwise operations. Before the monolithic processors, computers were made using separate ALU circuits, like TI SN74181+SN74182 or circuits combining an ALU with registers, e.g. AMD 2901/2903.

In the wide sense, ALU may be used to designate an execution unit of a processor, which may include many subunits, which may be ALUs in the strict sense, shifters, multipliers, dividers, shufflers etc.

An ALU in the strict sense is the minimal kind of execution unit required by a processor. The modern high-performance processors have much more complex execution units.


Most of mul/div was implemented in hardware since the 80186 (and the more or less compatible NEC V30 too). The microcode only loaded the operands into internal ALU registers, and did some final adjustment at the end. But it was still done as a sequence of single bit shifts with add/sub, taking one clock cycle per bit.

> For general-purpose computers, there have never been "4-bit ALU times".

Well, consider minicomputers made from bit-slices. Those would be 4-bit ALUs with CLA.

What drives me crazy about the 8-bit era is the lack of orthogonality. We're having this whole discussion because they didn't have a ZERO or ONES opcode. In 1972's 74181 chip those were just cases among 48 modes.


The minicomputers made with bit-slices had 16-bit ALUs or 32-bit ALUs.

Those 16-bit or 32-bit ALUs were made from 2-bit, 4-bit or 8-bit slices, but this did not matter for the programmer, and it did not matter even for the micro-programmer who implemented the instruction set architecture by writing microcode.

The size of the slices mattered a little for the schematic designer who had to draw the corresponding slices and their interconnections an it mattered a lot for the PCB designer, because each RALU slice (RALU = registers + ALU) was a separate integrated circuit package.

Intel made 2-bit RALU slices (the Intel 3000 series), AMD made 4-bit RALU slices (the 2900 series), which were the most successful on the market. There were a few other 4-bit RALU slices, e.g. the faster ECL 10800 series from Motorola, Later, there were a few 8-bit RALU slices, e.g. from Fairchild and from TI, but by that time the monolithic processors became quickly dominant, so the bit-sliced designs were abandoned.

The width of the slices mattered for cost, size and power consumption, but it did not matter for the architecture of the processor, because the slices were made to be chained into ALUs of any width that was a multiple of the slice width.


The prompt doesn't say to pick a random number. I asked to pick a number from 1-1000 and it chose 7,381. Then I asked why it picked that number and it said

Nothing mystical, I’m afraid. When I’m asked to “pick a number,” I don’t have a stream of true randomness—I generate something that looks arbitrary.

In this case, I leaned toward:

• something comfortably away from the edges (not near 1 or 10,000),

• not a round or patterned number (so, not 7,000 or 7,777),

• and with a bit of internal irregularity (7-3-8-2 has no obvious rhythm).

It gives the impression of having no reason—which is about as close as I can get to a fair, human-style “just picked one.”


Not sure why you have been downvoted. While the LLM's introspection can't be trusted, that's indeed what happens: asked to generate a random number, the LLM picks one that feels random enough: not a round one, not too central or extreme, no patterns, not a known one. It ends up being always the same.


It doesn't "pick" anything. It produces the most likely number after this question based on the data it has been trained with! Reasoning models might pick in a sense that they will come up the the rules (like the grand parent post shows), but still it will produce the "most likely" number after the reasoning.



The arrow that shows you which side of the car the gas cap is on.


I don't see anything related to your title at that site.

Maybe you meant this

https://archive.is/pIM27

But it is from 2018.


Fair point, thanks for flagging this. The post isn’t meant to be a historical piece on Coca-Cola itself, but a discussion around design intent and long-term visual consistency in marketing. Coca-Cola’s red is often cited as an example of how consistency compounds brand recognition over time.

I’m exploring this question while building an image workflow tool, where the challenge isn’t generating good-looking visuals, but keeping intent and consistency intact across repeated use.


Do you have a better link


I sure hope that is true.


How is this different from the local Reko markets?


On the front-end it may seem the same, but on the back-end we are focused on delivery route optimization to shorten the supply chain and algorithms for a much more tailored shopping experience.


>Jim Sanborn planned to auction off the solution to Kryptos, the puzzle he sculpted for the intelligence agency’s headquarters. Two fans of the work then discovered the solution.

Gift link https://www.nytimes.com/2025/10/16/science/kryptos-cia-solut...


Plankalkül - https://en.wikipedia.org/wiki/Plankalk%C3%BCl was designed but as far as I know never implemented (until 2000).

ENIAC Short Code - https://en.wikipedia.org/wiki/Short_Code_(computer_language) was implemented but very little used.


As a customer, if I can't do a chargeback in the case where you don't provide the advertised services, I simply won't do business with you and will go to one of your competitors.


With UPI, customers can raise a dispute and get their money back. As far as I know, the fee charged by some payment processors for UPI disputes is much lower than the fee charged for card disputes, or is non-existant, depending on the stage at which the dispute is resolved.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: