Hacker Newsnew | past | comments | ask | show | jobs | submit | naasking's commentslogin

Small models aren't entirely useless, and the NPU can run LLMs up to around 8B parameters from what I've seen. So one way they could be useful: Qwen3 text to speech models are all under 2B parameters, and Open AI's whisper-small speech to text model is under 1B parameters, so you could have an AI agent that you could talk to and could talk back, where, in theory, you could offload all audio-text and text-audio processing to the low power NPU and leave the GPU to do all of the LLM processing.

You could always offload some layers to the NPU for lower power use and leave the rest to the GPU. If the latter is power throttled (common for prefill, not for decode) that will be a performance improvement.

Routing in a MoE model might fit.

That seems like a really niche use case, and probably not worth the surface area? The power savings would have to be truly astonishing to justify it, given what a small fraction of compute time your average device spends processing voice input. I'd wager the 90th percentile siri/ok google/whatever user issues less than 10 voice queries per day. How much power can they use running on normal hardware and how much could it possibly matter?

It's just an example where it fits perfectly, and it's exactly what something like Alexa or Google home needs for low power machine learning, eg. when sitting idle it needs to consume as little power as possible while waiting for a trigger word.

Any context that needs some limited intelligence while consuming little power would benefit from this.


Yes, Vulkan is currently faster due to some ROCm regressions: https://github.com/ROCm/ROCm/issues/5805#issuecomment-414161...

ROCm should be faster in the end, if they ever fix those issues.


From what I understand, ROCm is a lot buggier and has some performance regressions on a lot of GPUs in the 7.x series. Vulkan performance for LLMs is apparently not far behind ROCm and is far more stable and predictable at this time.

Great! I hope the era of 1-bit LLMs really gets going.

Similar in spirit but different in execution as far as I can tell.

> Thiel's book has influenced so many entrepreneurs into believing Monopolies are Good.

Haven't read his book, but the idea that monopolies are good isn't typically made in a vacuum, it's made relative to alternatives, most often "ham-fisted government intervention". It's easier to take down a badly behaving monopoly than to change government, so believing monopolies are better than the alternatives seems like a decent heuristic.


>Haven't read his book, but the idea that monopolies are good isn't typically made in a vacuum, it's made relative to alternatives, most often "ham-fisted government intervention". It's easier to take down a badly behaving monopoly than to change government, so believing monopolies are better than the alternatives seems like a decent heuristic.

What? How is the first alternative poor government instead of multipolar competing companies? When was the last time a Monopoly was actually broken up in the US? ATT/Bell 50 years ago? lol


How would a bad monopoly be likely to be taken down if not by government intervention?

It eventually becomes so big and inefficient that it gets overtaken by new competitors.

A Monopoly implies an organization powerful enough to stop competition. Seems like this solution that relies on competitors is fatally flawed. If there are enough competitors to meaningfully compete then there isn't a monopoly.

You can only truly stop competition by government intervention.

When an organization gets big enough it is indistinguishable from government.

No organization can ever rival a real government like the US due to the latter's monopolization on the use of force.

Unfortunately there isn't also a requirement on not being a complete idotic psycho.

Monoply on force is meaningless, if you shoot your head off with it, which is what is happening with the US atm...

Criminally stupid is the trump all card, pun intended.


Insert better horse/car analogy here

A monopoly comes with serious moats, otherwise it wouldn’t be one. It can stay big and inefficient for decades.

Not if they hire good to go and literally kill the competition.

Open source vs. Microsoft is a great example.

Good for whom, exactly?

This seems like a classic straw man argument. Plutocratic oligarchs have been making the argument that private monopolies are better than representative democracy at basically any societal function for decades without any actual data.

This seems like a classic straw man argument. Plutocratic oligarchs have been making the argument that private monopolies are better than representative democracy at basically anything for decades.

> and they will spend infinite amounts of circular fake money to ensure hardware remains prohibitively expensive forever.

That's ridiculous, "infinite money" isn't a thing. They will spend as much as they can not because they want to keep local solutions out, but because it enables them to provide cheaper services and capture more of the market. We all eventually benefit from that.


> That's ridiculous, "infinite money" isn't a thing.

My reading of GP is that he was being sarcastic - "infinite amounts of circular fake money" is probably a reference to these circular deals going on.

If A hands B investment of $100, then B hands A $100 for purchase of hardware, A's equity in B, on paper, is $100, plus A has revenue of $100 (from B), which gives A total assets of $200.

Obviously it has to be shuffled more thoroughly, but that's the basic idea that I thought GP was referring to.


Cheaper for who? For them maybe but certainly not for you or me.

> it's not clear to me based on the description how this could all be done efficiently.

Depends how you define efficiency. The power use of this rig is a lot less than the large data centers that serve trillion parameter models. The page suggests that the final dollar cost per request is an order of magnitude lower than the frontier models charge.


> But none of this helps you solve harder problems, or distinguish between a simple solution which is wrong, and a more complex solution which is correct.

It does because hallucinations and low confidence share characteristics in the embedding vector which the small neural learns to recognize. And the fact that it continuously learns based on the feedback loop is pretty slick.


Agents need the ability to code but also to objectively and accurately evaluate whether changes resulted in real improvements. This requires skills with metrics and statistics. If they can make those reliable then self-improvement is basically assured, on a long enough timeline.

This is how hyperagents work. They Have the ability to measure improvement in both the meta agent and task agents. There approach requires task agents to tackle tasks that can be empirically evaluated.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: