Hacker Newsnew | past | comments | ask | show | jobs | submit | Imanari's commentslogin

Just tested it via openrounter in the Pi Coding agent and it regularly fails to use the read and write tool correctly, very disappointing. Anyone know a fix besides prompting "always use the provided tools instead of writing your own call"

If you have access to any other model it can create create pi extension that fixes problem. At least worked for me.

Like a special parser? Would you mind elaborating?


They have just released it, give it some time, they probably haven't pretested it with Pi

How can they fix it after the release? They would have to retrain/finetune it further, no?

It's only in preview right now. And anyway, yes, models regularly get updated training.

But in this case, it's more likely just to be a tooling issue.


Yeah hope they fix this for PI

Why do the Xiami releases never get any attention? Mimo-V2-Pro was pretty good, excited to try V2.5

I am curious about this myself, as it's a major company that I would think is worth taking seriously. But this and the previous release got suspiciously few comments.

> Listening often means not jumping to a solution; but absorbing and processing someone’s pain

> When in actuality, they should [...] finding a way to solve the pain points

Honest question, how do I 'absorb someones pain'? And how do I transition from that into eventually formulating the feature/ticket?


absolutely stunning

How well do the Gemma 4 models perform on agentic coding? What are your impressions?


I will also leave this here

https://github.com/shareAI-lab/learn-claude-code/tree/main/a...

I found it excellent in explaining a CC-like coding agent in layers.


Isn’t this just kicking the can down the road?

> but the LLM is rediscovering knowledge from scratch on every question

Unless the wiki stays fully in context now the LLM hast to re-read the wiki instead of re-reading the source files. Also this will introduce and accumulate subtle errors as we start to regurgitate 2nd-order information.

I totally get the idea but I think next gen models with 10M context and/or 1000tps will make this obsolete.


> I totally get the idea but I think next gen models with 10M context and/or 1000tps will make this obsolete.

We've already got 1m context, 800k context, and they still start "forgetting" things around the 200k - 300k mark.

What use is 10M context if degradation starts at 200k - 300k?


I use a home baked system based on obsidian that is essentially just “obsidian but with structured format on top with schemas” and I deploy this in multiple places with ranges of end users. It is more valuable than you think. The intermediary layer is great for capturing intent of design and determining when implementation diverges from that. There will always be a divergence from the intent of a system and how it actually behaves, and the code itself doesn’t capture that. The intermediate layer is lossy, it’s messy, it goes out of date, but it’s highly effective.

It’s not what this person is describing though. A self referential layer like this that’s entirely autonomous does feel completely valueless - because what is it actually solving? Making itself more efficient? The frontier model providers will be here in 3 weeks doing it better than you on that front. The real value is having a system that supports a human coming in and saying “this is how the system should actually behave”, and having the system be reasonably responsive to that.

I feel like a lot of the exercises like op are interesting but ultimately futile. You will not have the money these frontier providers do, and you do not have remotely the amount of information that they do on how to squeeze the most efficiency in how they work. Best bet is to just stick with the vanilla shit until the firehose of innovation slows down to something manageable, because otherwise the abstraction you build is gonna be completely irrelevant in two months


Interesting, I'd love to know more. Are parts of it public?


Indeed, I have it open source, but want to preserve my anonymity here. The main gist of it is Quartz as a static site frontend bundle, backed by Decap as an editor, so that non technical users can edit documents. The validation is twofold - frontmatter is validated by a typical yaml validator library, and then I created markdown body validation using some popular markdown AST libraries, so there are two sets of schemas - one for the frontmatter, one for the body, and documents must conform via ci. I ship it with a basic cli that essentially does validation and has a few other utilities. Not really that much magic, maybe 500 lines of code or so in the CLI and another few hundred lines doing validation and the other utilties. It's all in typescript, so I use the same validation in Decap when people do edits.


The “next gen of models” argument is a valid one and one I think of often, but if you truly buy it, it would stop do creating anything - since the next gen of models could make it obsolete.


The goal isn’t to keep the context every time, it’s to make the memory queryable. Like a data lake but for your ideas and decisions


this solves for now, and this solves for the future.

now you get to condense the findings that interest from a handful of papers

in the future it solves for condensing your interests in a whole field to a handful of papers or less


It is how I feel when I do it. And it certainly shows over time.


Maybe to be better able to restart the process and not lose track.


Fascinating! I wonder if new training techniques could emerge from this. If we say layer-1=translater, layer2-5=reasoner, layer6 retranslater, could we train small 6 layer models but evaluate their performance in a 1>n*(2-5)>6 setup to directly train towards optimal middle-layers that can be looped? You'd only have to train 6 layers but get the duplication-benefit of the middle layers for free.


Yes, training directly for a diverse mix of "looped" inference procedures makes a lot of sense as a way of allowing for increased inference-time compute. It would likely be complementary to the usual thinking approach, which essentially runs the "loop" LLM-wide - and, critically, yields interpretable output which lets us see what the LLM is thinking about.


I don't know who you are and how you are so sure about 'what top labs are actually doing' but I have a similar feeling about the issue. The models dont have to 'actually learn', the setup has to approximate 'actual learning' just well enough to be usefull.

> AND it can inherit all the accumulated memories/docs from its predecessor.

So we are talking about a whole system, not just the model? Reminds me of something I heard a while back 'AGI will be a product, not a model'


It reminds me of the standard counter to the Chinese Room thought experiment: the person inside doesn’t understand Chinese, but the system _does_. The person, the rules, and the lookup tables together form the thing doing the understanding.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: