Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'd rather focus on building on top of of LLMs than going lower level

Ollama makes that super easy. I tried llama.cpp first and hit build issues. Ollama worked out of the box



Sure.

Just be aware that there’s a lot of expressive difference between building on top of an HTTP API vs on top of a direct interface to the token sampler and model state.


I'm aware, I don't need that amount of sophistication yet.

Python seems to be the way to go deeper though. Is there a good reason I should be aware of to pick llama.cpp over python?


Python’s as good a choice as any for the application layer. You’re either going to be using PyTorch or llama-cpp-python to get the CUDA stuff working - both rely on native compiled C/C++ code to access GPUs and manage memory at the scale needed for LLMs. I’m not actually up to speed on the current state of the game there but my understanding is that llama.cpp’s less generic approach has allowed it to focus on specifically optimizing performance of llama-style LLMs.


I've seen more of the model fiddling, like logits restrictions and layer dropping, implemented in python, which is why I ask

Most of AI has centralized around Python, I see more of my code moving that way, like how I'm using LlamaIndex as my primary interface now, which supports ollama and many more model loaders / APIs




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: