Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are a lot of matrix multiplication algorithms out there with a lot of pluses and minuses. It's always a balance of accuracy, runtime, and scaling. This one probably has bad accuracy in floating point.


For everyone discussing the reduced accuracy/numerical stability of the algorithms in floating-point, this is true. But note that the application of the algorithms in the work is explored for fixed-point MM/quantized integer NN inference, not floating-point MM/inference. Hence, there is no reduction in accuracy for that application of it compared to using conventional fixed-point MM.


"Conventional fixed-point MM" is a large suite of algorithms. It is correct that this is a 2x reduction in MULs compared to naive fixed-point matrix multiply, but there is a large body of literature out there with other circuits. This is a cool trick to add to the group.


Inference world is gradually switching from INT formats to FP formats. FP8 is already supported in modern hardware, and FP4 support is coming. In my experiments I get better perplexity in language models with FP4 than with INT4.


How is FP fundamentally different than integers? I've done FPGA programming and it just seems like the programmer has to decide where/when to do the shifting based on the expected range of the data. I'm not sure how this is "natively supported" in hardware.


If you have designed FPUs you should know that FP computation involves a lot more additional operations than just shifting (e.g. rounding, subnormals, and special value handling). That’s why, for example, CPUs use different hardware blocks for INT vs FP computation.

But that’s not the point. The point is, this particular method to speed up matmul is not suitable for FP.


I'm no expert but I suspect this is wrong. To me, this is like saying you don't need to worry about integer overflow because your operations are only working on fixed integers. Really? You don't care if you multiply or add two large numbers and they spill over?

The more appropriate answer, I suspect, is that the numerical precision and stability sacrifices are more than adequate for normal usage.

If I'm wrong about this, I would certainly like to know.


In hardware, you control your integer widths completely, so if you add two 32-bit ints to a 33-bit int, there is no chance of overflow. The same goes for multiplications, etc.


Yeah with shifts you can guarantee no overflow, but you have to decide under what circumstances is avoiding overflow/clipping worth the loss of precision.

Fixed point typically requires alot more programming, but sometimes its worth it if you know what the data ranges are.


I don't know why this answer is getting downvoted. This is absolutely correct.

W. Miller has a paper discussing, under conditions of numerical stability, O(n^3) multiplications is necessary [0]. Any algorithm that gets sub cubic runtime for matrix multiplication, like Strassen's or Coppersmith's, must sacrifice some amount of precision or stability.

[0] https://epubs.siam.org/doi/10.1137/0204009



The document said it outputs the exact same values as the conventional method. There is no accuracy trade off here.


The paper cited is about hardware, where there is no accuracy tradeoff because you control the numerical precision completely and use fixed point. In a software implementation, neither is true. There is no chance that you will get the exact same values out of this method that you do out of other FP matmuls.


For floating point? Are you sure?


Opening statement of README

    This repository contains the source code for ML hardware architectures that 
    require nearly half the number of multiplier units to achieve the same 
    performance, by executing alternative inner-product algorithms that trade 
    nearly half the multiplications for cheap low-bitwidth additions, while still 
    producing identical output as the conventional inner product.


I just looked at the paper: the answer is no, floating point is not supported.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: