More

naasking · 2026-04-02T18:02:07 1775152927

Small models aren't entirely useless, and the NPU can run LLMs up to around 8B parameters from what I've seen. So one way they could be useful: Qwen3 text to speech models are all under 2B parameters, and Open AI's whisper-small speech to text model is under 1B parameters, so you could have an AI agent that you could talk to and could talk back, where, in theory, you could offload all audio-text and text-audio processing to the low power NPU and leave the GPU to do all of the LLM processing.

zozbot234 · 2026-04-02T18:15:39 1775153739

You could always offload some layers to the NPU for lower power use and leave the rest to the GPU. If the latter is power throttled (common for prefill, not for decode) that will be a performance improvement.

naasking · 2026-04-03T11:46:42 1775216802

Routing in a MoE model might fit.

jcgrillo · 2026-04-02T20:58:54 1775163534

That seems like a really niche use case, and probably not worth the surface area? The power savings would have to be truly astonishing to justify it, given what a small fraction of compute time your average device spends processing voice input. I'd wager the 90th percentile siri/ok google/whatever user issues less than 10 voice queries per day. How much power can they use running on normal hardware and how much could it possibly matter?

naasking · 2026-04-03T11:46:13 1775216773

It's just an example where it fits perfectly, and it's exactly what something like Alexa or Google home needs for low power machine learning, eg. when sitting idle it needs to consume as little power as possible while waiting for a trigger word.

Any context that needs some limited intelligence while consuming little power would benefit from this.

naasking · 2026-04-02T17:39:43 1775151583

Yes, Vulkan is currently faster due to some ROCm regressions: https://github.com/ROCm/ROCm/issues/5805#issuecomment-414161...

ROCm should be faster in the end, if they ever fix those issues.

naasking · 2026-04-02T17:36:57 1775151417

From what I understand, ROCm is a lot buggier and has some performance regressions on a lot of GPUs in the 7.x series. Vulkan performance for LLMs is apparently not far behind ROCm and is far more stable and predictable at this time.

naasking · 2026-04-01T14:19:35 1775053175

Great! I hope the era of 1-bit LLMs really gets going.

naasking · 2026-04-01T14:18:08 1775053088

Similar in spirit but different in execution as far as I can tell.

naasking · 2026-03-30T15:19:20 1774883960

> Thiel's book has influenced so many entrepreneurs into believing Monopolies are Good.

Haven't read his book, but the idea that monopolies are good isn't typically made in a vacuum, it's made relative to alternatives, most often "ham-fisted government intervention". It's easier to take down a badly behaving monopoly than to change government, so believing monopolies are better than the alternatives seems like a decent heuristic.

vonneumannstan · 2026-03-30T15:47:28 1774885648

>Haven't read his book, but the idea that monopolies are good isn't typically made in a vacuum, it's made relative to alternatives, most often "ham-fisted government intervention". It's easier to take down a badly behaving monopoly than to change government, so believing monopolies are better than the alternatives seems like a decent heuristic.

What? How is the first alternative poor government instead of multipolar competing companies? When was the last time a Monopoly was actually broken up in the US? ATT/Bell 50 years ago? lol

layer8 · 2026-03-30T16:29:26 1774888166

How would a bad monopoly be likely to be taken down if not by government intervention?

logicchains · 2026-03-30T17:49:40 1774892980

It eventually becomes so big and inefficient that it gets overtaken by new competitors.

yoyohello13 · 2026-03-30T18:31:58 1774895518

A Monopoly implies an organization powerful enough to stop competition. Seems like this solution that relies on competitors is fatally flawed. If there are enough competitors to meaningfully compete then there isn't a monopoly.

naasking · 2026-03-30T20:16:30 1774901790

You can only truly stop competition by government intervention.

yoyohello13 · 2026-03-30T21:16:26 1774905386

When an organization gets big enough it is indistinguishable from government.

naasking · 2026-03-31T12:56:54 1774961814

No organization can ever rival a real government like the US due to the latter's monopolization on the use of force.

smaudet · 2026-04-03T03:35:05 1775187305

Unfortunately there isn't also a requirement on not being a complete idotic psycho.

Monoply on force is meaningless, if you shoot your head off with it, which is what is happening with the US atm...

Criminally stupid is the trump all card, pun intended.

hammock · 2026-03-31T02:40:52 1774924852

Insert better horse/car analogy here

layer8 · 2026-03-30T19:39:24 1774899564

A monopoly comes with serious moats, otherwise it wouldn’t be one. It can stay big and inefficient for decades.

Jensson · 2026-03-30T19:49:41 1774900181

Not if they hire good to go and literally kill the competition.

naasking · 2026-03-30T20:15:43 1774901743

Open source vs. Microsoft is a great example.

mrcincinnatus · 2026-03-30T16:02:10 1774886530

Good for whom, exactly?

billiam · 2026-03-30T16:17:51 1774887471

This seems like a classic straw man argument. Plutocratic oligarchs have been making the argument that private monopolies are better than representative democracy at basically any societal function for decades without any actual data.

billiam · 2026-03-30T16:08:10 1774886890

This seems like a classic straw man argument. Plutocratic oligarchs have been making the argument that private monopolies are better than representative democracy at basically anything for decades.

naasking · 2026-03-29T12:42:40 1774788160

> and they will spend infinite amounts of circular fake money to ensure hardware remains prohibitively expensive forever.

That's ridiculous, "infinite money" isn't a thing. They will spend as much as they can not because they want to keep local solutions out, but because it enables them to provide cheaper services and capture more of the market. We all eventually benefit from that.

lelanthran · 2026-03-29T15:01:55 1774796515

> That's ridiculous, "infinite money" isn't a thing.

My reading of GP is that he was being sarcastic - "infinite amounts of circular fake money" is probably a reference to these circular deals going on.

If A hands B investment of $100, then B hands A $100 for purchase of hardware, A's equity in B, on paper, is $100, plus A has revenue of $100 (from B), which gives A total assets of $200.

Obviously it has to be shuffled more thoroughly, but that's the basic idea that I thought GP was referring to.

throwatdem12311 · 2026-03-29T14:18:54 1774793934

Cheaper for who? For them maybe but certainly not for you or me.

naasking · 2026-03-27T13:10:03 1774617003

> it's not clear to me based on the description how this could all be done efficiently.

Depends how you define efficiency. The power use of this rig is a lot less than the large data centers that serve trillion parameter models. The page suggests that the final dollar cost per request is an order of magnitude lower than the frontier models charge.

naasking · 2026-03-27T12:48:24 1774615704

> But none of this helps you solve harder problems, or distinguish between a simple solution which is wrong, and a more complex solution which is correct.

It does because hallucinations and low confidence share characteristics in the embedding vector which the small neural learns to recognize. And the fact that it continuously learns based on the feedback loop is pretty slick.

naasking · 2026-03-27T00:13:33 1774570413

Agents need the ability to code but also to objectively and accurately evaluate whether changes resulted in real improvements. This requires skills with metrics and statistics. If they can make those reliable then self-improvement is basically assured, on a long enough timeline.

derek1800 · 2026-03-27T05:44:27 1774590267

This is how hyperagents work. They Have the ability to measure improvement in both the meta agent and task agents. There approach requires task agents to tackle tasks that can be empirically evaluated.