Hacker Newsnew | past | comments | ask | show | jobs | submit | v3ss0n's commentslogin

Qwen 3.6 burns it to the ground. it was not even a challenge. Gemma4 seriously fails at toolcalls and agentic works. It got all messed up after 2-3 turns of Vibecoding.

How do you run it? vllm? llama.cpp?

Can you share some parameters you enable tool calling and agentic usage?

Or, higher level, some philosophies on what approaches you are using for tuning to get better tool calling and/or agentic usage?

I'm having surprisingly good success with unsloth/Qwen3.6-27B-GGUF:Q4_K_M (love unsloth guys) on my RTX3090/24GB using opencode as the orchestrator.

It concocts some misleading paths, but the code often compiles, and I consider that a victory.

You have to watch it like you would watch a 14 year old boy who says he is doing his homework but you hear the sound effects of explosions.


I run it with Llama.cpp on my RTX 3090. Also using the same Unsloth model.

My config is similar to: https://github.com/noonghunna/club-3090/blob/master/docs/eng...

I need to try out some of the other set ups mentioned in this repo for increased TPS.


naw, i mean i prefer Qwen 3.6 to Gemma 90% of the time, especially the MoE with a light tune to make it's tone more claude-like, but Gemma 4 is definitely better in some cases and I think they're pretty close in general.

The difference basically boils down to Gemma 4 making more assumptions and Qwen 3.6 sticking closer to the prompt, if your prompt is bad or leaves things up to the imagination, Gemma will do a better job, if you need strict prompt adherence Qwen is better. Since local models are "dumb" i think it makes sense to prefer prompt adherence, but there are complex tasks that Gemma will complete much much faster than Qwen because it makes the right assumptions the first time and as a result even with slower inference requires way fewer turns.

My speculation is that this comes from google having a much better strategy for filtering their training data, I think this also shows up in the shape of the world knowledge of the models. Gemma's world knowledge seems deeper even though the models are of roughly equivalent size to the Qwen counterparts so it's mostly likely just concentrated in places that are more relevant to my queries.

Most notably in my testing, Gemma 4 31b is the ONLY local model that will tell me the significance of 1738 correctly. Even most flagship/cloud models answer with some hallucinatory nonsense.


Counter-point: I built an agent that can only interface with Kakoune, a much less common and more challenging situation for an LLM to find itself in, and Gemma4-A4B 8bit quantized does remarkably better in actually figuring out how to get text in buffers than Qwen3.6-35B-A3B in a similar class as Gemma4 A4B.

Now, is this the usual use case? No, it's a benchmark I created specifically in order to put LLMs in situations where they can't just blast out their bash commands without having to interface with something else and adapt.


Fellow kakoune user here. I'm curious about your use case/ what you're doing with it!

I'm just messing around with building agents, that's all. I'm not super interested in making ones that just sit in a terminal executing shell scripts because truth be told they're absolutely trivial to make and don't show any interesting parts of LLMs, whereas telling an agent that they are sitting in Kakoune is a whole lot more interesting and really shows a lot of what LLMs aren't great at, and how they'll have to fight their urge to spit out overwrought bash invocations or at the very least find a way to fit those into something new.

So far the only tools the agent has access to are `evaluate_commands(commands=["...", "..."])` and `get_buffer_contents()`, which really makes them have to work for doing things. I could make it super easy for them but then it wouldn't be an interesting experiment.


As an addendum to this:

If I were to try to make something more useful out of this, I'd probably add the ability for LLMs to list buffers, probably give them an easier out for executing shell scripts in the way they prefer, give them an easier time to list docs and a few other things like that.

The tools and the interaction with Kakoune is really trivial to write; I already use this by having the agent write to the session FIFO (a very simple binary format) and I extract information via my own FIFO that Kakoune writes to (this is used for the buffer data only right now).

I think once you started using it more as a tool and not a pseudo-benchmark like I am you'd probably think of even more things to add but a lot of it comes down to just making Kakoune's state visible and making shell spam (which the LLMs love) easier.


I agree but would add that gemma 4 is really nice at vibing though in ways qwen 3.6 could never.

Maybe it could be fun to hook them up via a2a protocol as left and right brain agents operating in tandem.


Gemma4 is definitely not used for vibe/agentic coding. Not even worth trying. But its a different weight class.

Gemma 4 31b was working ok for me; but it was consuming tons of memory on SWA checkpoints, I had to turn them way down, and as a 31b dense model is fairly slow on a Strix Halo. I did have a lot of tool calling issues on 26b-a4b, though.

The Qwen models are quite solid though.


What are you using to run it vllm, llama.cpp or other?

Can you share your switches and approach for using tools?


llama.cpp

My setup is a bit of a mess as I experiment with different ways of configuring and hosting local models. So at some point I was experimenting with the router server but stopped doing that, but some of my settings are still in models.ini while some are on the command line.

podman run --env "HF_TOKEN=$HF_TOKEN" --env "LLAMA_SERVER_SLOTS_DEBUG=1" -p 8080:8080 --device /dev/kfd --device /dev/dri --security-opt seccomp=unconfined --security-opt label=disable --rm -it -v ~/.cache/huggingface/:/root/.cache/huggingface/ -v ./unsloth:/app/unsloth -v ./models.ini:/app/models.ini llama.cpp-rocm7.2 -hf unsloth/gemma-4-31B-it-GGUF:UD-Q8_K_XL --chat-template-file /root/.cache/huggingface/gemma-4-31B-it-chat_template.jinja -ctxcp 8 --port 8080 --host 0.0.0.0 -dio --models-preset models.ini

With the following as the relevant settings in models.ini (I actually have no idea if these settings are applied when not using the router server, it's been hard for me to figure out what settings are actually applied when using bot the command line and models.ini

  [*]
  jinja = true
  seed = 3407
  flash-attn = on

  [unsloth/gemma-4-31B-it-GGUF:UD-Q8_K_XL]
  temperature = 1.0
  top_p = 0.95
  top_k = 64
And it looks like the chat_template.jinja I have is actually out of date by now, there was a new one pushed just a couple of days ago that seems to have some further tool calling fixes: https://huggingface.co/google/gemma-4-31B-it/blob/main/chat_...

As my harness, I'm using pi, with a pretty vanilla config.

Anyhow, Gemms 4 31b worked in this config, but it was slow and RAM hungry. Since then, I've mostly moved to Qwen 3.6 35b-a3b because it's a lot faster.

I'm not actually doing anything useful with these yet, but I've used them for some experiments and Qwen 3.6 35b-a3b was capable of doing some pretty long mostly unsupervised agentic loops in my experimentation.


It breaks the whole idea of Markdown - to be as natural as possible. This is like markdown with CSS Slap in , such a disgrace. It is not simple and as ugly as CSS.

most are GDDR5 and 6


With LLM do we actually need new programming languages?


Even though this just showed up on HN (for the 10th time?) the Lobster project started in ~2010. No LLMs on the horizon back then.

And while Lobster is a niche language LLMs don't know as well, they do surprisingly well coding in it, especially in the context of a larger codebase as context. It occasionally slips in Python-isms but nothing that can't be fixed easily.

Not suitable for larger autonomous coding projects, though.


Yes.

In fact, LLMs have shown that we really, really need new programming languages.

1. They have shown that the information density in our existing languages is extremely low: small prompts can generate very large programs.

2. But the only way to get that high information density now (with LLMs) is to give up any hope of predictability. I want both.


> They have shown that the information density in our existing languages is extremely low: small prompts can generate very large programs.

"Write a book about a small person who happens upon a magical ring which turns out to be the repository of an evil entities power. The small person needs to destroy the ring somehow, probably using the same means it was created"

...wait a few minutes...

THE LORD OF THE RINGS

http://lotrproject.com/statistics/books/wordscount


Small prompts leading to large programs has absolutely nothing to do with programming languages and everything to do with the design of the word generators used to produce the programs — which ingest millions of programs in training and can spit out almost entire examples based on them.


APL?


Sadly, probably not. I fear new languages will struggle from here on out. As a language guy, very few things in this new AI world make me more sad than this.


I don't get the feeling this will happen. LLMs are extremely good at learning new languages because that's basically their whole point. If your new language has a standard library, and the LLM can see its source code, I am sure you can give it to any last-generation AI and it will happily spit out perfectly correct new code in it. If you give it access to a reference docs, then it can even ensure it never generates syntactically incorrect code quite well. As long as your error messages are enough to understand what a problem's root cause is, the LLM will iterate and explore until it gets it right.

Not sure if this is a good example, but I used ChatGPT (not even Codex) to fix some Common Lisp code for me, and it absolutely nailed it. Sure, Common Lisp has been around for a long time, but there's not so much Common Lisp code around for LLMs to train on... but OTOH it has a hyperspec which defines the language and much of the standard libraries so I believe the LLM can produce perfect Common Lisp based on mostly that.


I think it would be cool if a language specifically for LLMs came about. It should have something like required preconditions and postconditions so that a deterministic compiler can verify the assumptions the LLM is claiming. Something like a theorem prover, but targeted specifically for programming and efficient compilation/runtime. And it doesn't need all the niceties human programmers tend to prefer (implicit conversions comes to mind).


If you're that confident in the LLM's output, just train it to output some kind of intermediate language, or even machine code.

And if you're not that confident, shouldn't you still be optimising for humans, because humans have to check the LLM's output?


At least in programming, humans have to check the product of the LLM's output rather than the output itself.


I'm working on this now.

It's a Profile Guided Optimization language - with memory safety like Rust.

It's extremely easy to optimize assuming you either 1) profile it in production (obviously has costs) or 2) can generate realistic workloads to test against.

It's like Rust, in that it makes expressing common illegal states just outright impossible. Though it goes much further than Rust.

And it's easier to read than Swift or Go.

There's a lot of magic that happens with defaults that languages like Zig or Rust don't want, because they want every cost signal to be as visible as possible, so you can understand the cost of a line and a function.

LLMs with tests can - I hope - do this without that noise.

We shall see.


Do you have a repo?


Yes.

I'm almost ready to launch v0.1 - but the documentation is especially a mess right now, so I don't want to share yet.

I'll update this comment in a week or so [=


Appreciate it!


Possibly. First, I think there's still low hanging fruit in creating a programming language designed to be as easy as possible for agents to work with that we won't try to unlock until people writing code is a curiosity. Second, agents don't care about verbosity of code, so we can do verbosity/correctness/tooling tradeoffs that wouldn't have made sense when humans were the sole consumers of the code.


Yes, very much so. Programming languages are tools for thought, if you want your LLMs to think better, then they'll need better tools for thinking than the ones we have today. That they are mostly thinking in and writing Python is incidental to when they were born and a limitation of current AI technology, not its final evolution.


A more concise language is more efficient for an LLM to produce and easier for a human to verify.

I don't think LLMs have solved the problem of wanting code that's concise and also performant.


But a less concise language is (theoretically, if you're doing useful stuff with the verbosity) easier for machines to verify.


Yes, languages that talk to llms that is.


imho no


Novel programming languages have still educational value for those building them and, yes, we still need programming languages. I dont see any reason we would not need them. Even if Ai is going to write the code for you; how is it going to write it with no programming language? With raw binary? Absolutly not.


Eventually it wont need to write any code at all. The end goal for AI is "The Final Software" - no more software needs to be written, you just tell the AI what you actually want done and it does it, no need for it to generate a program.


But how do you know AI can generate programs without writing code? It can't today -- in fact the best thinking models work by writing code as part of the process. Natural intelligence requires it as well, as all our jobs are about expressing problem domains formally. So why would we expect an artificial intelligence should be able to reason about the kinds of programming problems we want them to without a formal language?

Maybe they will not be called programming languages anymore because that name was mostly incidental to the fact they were used to write programs. But formal, constructed languages are very much still needed because without them, you will struggle with abstraction, specification, setting constraints and boundaries, etc. if all you use is natural language. We invented these things for a reason!

Also the AI will have to communicate output to you, and you'll have to be sure of what it means, which is hard to do with natural language. Thus you'll still have to know how to read some form of code -- unless you're willing to be fooled by the AI through its imprecise use of natural language.


how did you get that 'no programming language' conclusion? there are so many well established languages that are more than we need already, the market has picked the winners too, and AI has well trained with them, these are facts. If there is a new language needed down the road for AI coders, most likely it will be created by using AI itself. for the moment, human created niche language is too late for the party, move on.


Will this be buried like rest if cancer cures?


How zulip is worse? 700K line of high quality code is something to br proud of. It's backward compatible and have very powerful threading system, slack can't compare anything to that


The name they gave to their first electric car sounds discouraging.. Why would someone name it Luce.. which sounds like Lose


It doesn't sound like that in Italian. It's loo-che.


Very narrow spec of how company can fit into code.Try Integrating ERP system in non-tech business. You will see how resistant the people are to use a software that streamline their business and need customization that fit their operation procedures or outright resistance because it would make things a lot easier and make them irrelevant.

On successful integration all they would use ERP system would be for signing in , chatting, producing invoices, the rest would still be done manually, if lucky in excel files.


Yeah, if you want initd support work it on your own , its opensource anyways (and initd is too basic to do any of those advanced features of systemd)


…or just continue to use SDDM like before.

It’s a new piece of software that’s tied to systemd. Existing software is unchanged.


They are Ethnic Chinese who were operating scam centers in collaboration with junta at northern area Laukkai.

There are more at shwe Koko area.


were they Chinese citizens?


They are usually Chinese triad gangsters who operate scam businesses and illegal casinos in south east asia.

https://en.wikipedia.org/wiki/Triad_(organised_crime)


China claims jurisdiction because 20 Chinese citizens were murdered, if that’s what you’re wondering.


They are Han Chinese, Most of those triad family come from Yunan borders a few decades ago. In Myanmar , thanks to military junta , they can easily buy citizenship by bribing authorities One of them is even a MP Senate in Military aligned party.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: