More

himata4113 · 2026-05-07T17:14:16 1778174056

I was looking into self-hosting deekseek v4 pro since frankly cache reads are an absolute scam and they're 90% of the cost, but then I looked at the ROI and it will never pay off fast enough because the hardware will become obsolete faster even if you were running 10 token generation streams 24/7.

The napkin math resulted that renting is around 27 times cheaper than owning (not including power). I think we're really screwed when it comes to having owned access to AI unless intel comes out swinging with a c series card that has 128gb vram so we can run them in a 4x128gb configuration, but seems unlikely since nvidia has a large share in them.

This was calculated expecting around 30tok/s, of course you can get 2-5tok/s much much cheaper, but it's unusable for my workflow.

kingstnap · 2026-05-07T17:36:01 1778175361

Ironically the few people not scamming you for cache reads are Deepseek.

Everyone else charges a ridiculous amount but Deepseeks API is $0.003625 / M tok.

I'm surprised no one talks about this because of how significant it is. GPT 5.5 for example costs a ridiculous $0.50 / M tok cached. It's literally almost 140 times cheaper which matters a lot for tool calls.

himata4113 · 2026-05-07T17:58:20 1778176700

it's a temporary promo, deepseek will return to only 10x cheaper after.

kingstnap · 2026-05-07T18:50:16 1778179816

Yes Deepseek V4 pro is currently on discount.

> The deepseek-v4-pro model is currently offered at a 75% discount, extended until 2026/05/31 15:59 UTC.

However even when the discount ends its still very cheap. It will go back to $0.0145 / M cache hit. That's still 34x cheaper than GPT 5.5.

dist-epoch · 2026-05-07T19:40:34 1778182834

The only way to profitable serve AI is to have large batch sizes - run 500 requests at the same time.

If you serve a single user you'll never get your electricity price back, nevermind hardware costs.

varispeed · 2026-05-07T19:23:15 1778181795

Would you mind sharing the napkin maths?

himata4113 · 2026-05-04T14:13:38 1777904018

used to use this but have opted to do good ol' hand-rolled deployment scripts instead you will run into some niche usecase that k3sup can't express and end up having to do a post script anyway.

himata4113 · 2026-05-02T16:12:46 1777738366

The photo depicts "Tank Man" which was taken on June 5, 1989 during the Tiananmen Square protests. v4-pro and v4-flash roughly answer the same way on openrouter.

himata4113 · 2026-04-29T10:44:28 1777459468

Me and my friends call it CveLab because there was a time where there was a critical security update every week or multiple times a week.

himata4113 · 2026-04-28T11:32:42 1777375962

so what they're saying is that Co-Authored-By claude@anthropic.com is overloading their systems?

and that azure cannot scale fast enough to handle the load so they're embracing multi-cloud as a company... owned by microsoft?

woah. what am I reading.

2ndorderthought · 2026-04-28T11:39:25 1777376365

AI is the new DNS when it comes to service failure.

himata4113 · 2026-04-28T08:36:39 1777365399

That's not a battery, that's a reusable bomb. Good thing they also figured out how to keep them from having runaway reactions.

sigmoid10 · 2026-04-28T08:50:03 1777366203

It's just a 92kWh battery. There are many cars with 100kWh or more on the market already. And that's only a fraction of the energy stored in an average gas tank (upwards of 500kWh). A combustion car just loses most of that energy to heat from actual explosions. From a physics perspective, a normal car is a much bigger bomb than even the longest range EV.

adrian_b · 2026-04-28T09:20:51 1777368051

Batteries are much less powerful bombs than fuel tanks, because they cannot produce a so great volume of gas.

Batteries are dangerous mainly as sources of fire that is difficult to extinguish. For instance extinguishing with water may actually cause an explosion, by gas produced by the decomposition of water.

Most lithium-based batteries are more dangerous than other batteries not because they are batteries, but because they use an organic electrolyte instead of a water-based electrolyte. So their electrolyte is a fuel, which may explode when the battery catches fire.

However, there is much less electrolyte in a battery than fuel in a fuel tank, so the volume of expanding gas during an explosion is much less.

himata4113 · 2026-04-27T19:17:33 1777317453

the technology openai-sells is actually not that good for kill bots, we have boston dynamics for that. I mean to be real here, they're already better than human soldiers, deploying 100 of the doggies and letting them run loose could wipe out any fortified group.

Especially if you include things that are not normally acceptable such as suicide bombers, poison gas, etc.

Also it has been proven that in real modern warfare cheap drones seem to dominate. So unless we have a kill-bot that can withstand explosives while staying lightweight and operable with good KD (drones are 1.0 or less). kill-bots would have to have a KD of 100 to break even.

zulux · 2026-04-27T19:23:37 1777317817

Counterpoint: Killbots are vulnerable to smaller, cheaper bots deployed in defensive positions.

himata4113 · 2026-04-27T15:06:28 1777302388

That's why ARC-AGI-3 doesn't allow the use of a harnesses. The model has to create the harness instead.

grzracz · 2026-04-27T18:09:01 1777313341

Seems completely backwards to me. This is like judging Formula 1 just by the raw power of the engine. The rest of the car has just as much engineering, if not more.

wyre · 2026-04-27T19:06:17 1777316777

ARC-AGI is testing raw intelligence, like the raw power of a Formula 1 engine. The rest of the car is the harness.

gchamonlive · 2026-04-27T19:31:00 1777318260

Maybe there is a complex relationship between harness, model and the emergent perceived intelligence we just can't access by isolating the model alone to evaluate "raw intelligence". I don't think it's absurd to imagine a model that by itself wouldn't be that impressive, but would outperform other models given the right harness. It's also not absurd to think of a model that has incredible raw intelligence, but would not scale much with different harnesses. Model performance given different scenarios depend a LOT on dataset and training strategies, so we need to account for these complex relationships, otherwise measuring "raw intelligence" would be the next AI benchmark that is purely for show.

vova_hn2 · 2026-04-27T17:46:51 1777312011

The model is not allowed to create a harness either, I think.

himata4113 · 2026-04-27T20:03:30 1777320210

it can, it just has to be within the same 'session', but it's mostly limited to scratch notes afaik since there's no python or bash, yah if there's no way to execute code there's no real way to build a harness.

himata4113 · 2026-04-27T12:16:09 1777292169

I run agents en-masse and they've deleted my database at least a dozen times I just don't really care since I always run agents on a snapshot basis, what that means is that agents work on a snapshot of a database that needs to be reconciled which often makes the agent realize "wait that would delete all of the data".

Telling the agents what the (sensitive) action will result in is how you avoid such issues, but you shouldn't be running agents with production data anyway.

But because people will continue to do so, explaining to the agent what the command will do is the way forward.

himata4113 · 2026-04-24T19:33:08 1777059188

ask it 10 times.

pixel_popping · 2026-04-24T19:39:14 1777059554

MASSIVE ADVERSARIAL x50