Hacker Newsnew | past | comments | ask | show | jobs | submit | clueless's commentslogin

> It's only true in a universe where Iran would have collapsed from within before the expiration of the sunset clause, and that clearly was not going to happen.

No one can know this hypothetical, but some def bet their entire futures/careers on this: that an Iran with a more prosperous middle class (as a result of JCPOA) might have had a better chance for social/internal reform, i.e. regime change.


> If you take a lot of chances, that adds up eventually and you'll have some big wins. Just do it safely, so that they don't add up to a lot of big losses, too.

And here is great contradiction in this whole essay. You can't "safely" take a lot of chances and not lose big, when in most cases to have big wins, one has to do unsafe things...

This is also why folks who have a safety net (in terms of family wealth, etc) tend to do better as entrepreneurs. Not sure this essay is helpful.


Step 1 have resources, Step 2 boot strap yourself.

If you really want to succeed, you need to pick the best parents.


What are some sample real world cases folks are using to fine tune their own small/medium models?


Oh I wrote up a post on X on this exact question! https://x.com/danielhanchen/status/1979389893165060345?s=20

1. Cursor used online RL to get +28% approval rate: https://cursor.com/blog/tab-rl

2. Vercel used RFT for their AutoFix model for V0: https://vercel.com/blog/v0-composite-model-family

3. Perplexity's Sonar for Deep Research Reasoning I think was a finetuned model: https://docs.perplexity.ai/docs/getting-started/overview

4. Doordash uses LoRA, QLoRA for a "Generalized Attribute Extraction model" https://careersatdoordash.com/blog/unleashing-the-power-of-l...

5. NASA flood water detection https://earthdata.nasa.gov/news/nasa-ibm- openly-release-geospatial-ai-foundation-model-nasa-earth-observation-data6

6. Online RL for robotics - imagine you teaching a robot in the future via some mini finetuning

7. OpenAI's RFT page has more: https://developers.openai.com/api/docs/guides/rft-use-cases

8. For larger models - https://www.mercor.com/blog/expert-data-drives-model-perform...


Only to prompt thought on this exact question, im interested in answers:

I just ran a benchmark against haiku of a very simple document classification task that at the moment we farm out to haiku in parallel. very naive same prompt system via same api AWS bedrock, and can see that the a few of the 4b models are pretty good match, and could be easily run locally or just for cheap via a hosted provider. The "how much data and how much improvement" is a question i dont have a good intuition for anymore. I dont even have an order of magnitude guess on those two axis.

Heres raw numbers to spark discussion:

| Model | DocType% | Year% | Subject% | In $/MTok |

|---------------|----------|-------|----------|-----------|

| llama-70b -----| 83 | 98 | 96 | $0.72 |

| gpt-oss-20b --| 83 | 97 | 92 | $0.07 |

| ministral-14b -| 84 | 100 | 90 | $0.20 |

| gemma-4b ----| 75 | 93 | 91 | $0.04 |

| glm-flash-30b -| 83 | 93 | 90 | $0.07 |

| llama-1b ------| 47 | 90 | 58 | $0.10 |

percents are doc type (categorical), year, and subject name match against haiku. just uses the first 4 pages.

in the old world where these were my own in house models, id be interested in seeing if i could uplift those nubmers with traingin, but i haven't done that with the new LLMs in a while. keen to get even a finger to the air if possible.

Can easily generate tens of thousands of examples.

Might try myself, but always keen for an opinion.

_edit for table formatting_


You can fine tune a small LLM with a few thousand examples in just a few hours for a few dollars. It can be a bit tricky to host, but if you share a rough idea of the volume and whether this needs to be real-time or batched, I could list some of the tradeoffs you'd think about.

Source: Consulted for a few companies to help them finetune a bunch of LLMs. Typical categorical / data extraction use cases would have ~10x fewer errors at 100x lower inference cost than using the OpenAI models at the time.


ok, even that "few thousand examples" heuristic is useful. the usecase would be to run this task over id say somewhere in the order of magnitude of 100k extractions in a run, batched not real time, and we'd be interested in (and already do) reruns regularly with minor tweaks to the extracted blob (1-10 simple fields, nothing complex).

My interest in fine tuning at all is based on an adjacent interest in self hosting small models, although i tested this on aws bedrock for ease of comparison, so my hope is that given we are self hosting, then fine tuning and hosting our tuned model shouldn't be terribly difficult, at least compared to managed finetuning solutions on cloud providers which im generally wary of. Happy for those assumptions to be challenged.


Labeling or categorization tasks like this are the bread and butter of small fine tuned models. Especially if you need outputs in a specific json format or whatever.

I did an experiment where I did very simple SFT on Mistral 7b and it was extremely good at converting receipt images into structured json outputs and I only used 1,000 examples. The difficulty is trying to get a diverse enough set of examples, evaling, etc.

If you have great data with simple input output pairs, you should really give it a shot.


if you add 2 spaces at the start of the line, you turn it into a code block

  like this


  | Model | DocType% | Year% | Subject% | In $/MTok |

  |----------------|----|-----|----|-------|

  | llama-70b -----| 83 |  98 | 96 | $0.72 |

  | gpt-oss-20b ---| 83 |  97 | 92 | $0.07 |

  | ministral-14b -| 84 | 100 | 90 | $0.20 |

  | gemma-4b ------| 75 |  93 | 91 | $0.04 |

  | glm-flash-30b -| 83 |  93 | 90 | $0.07 |

  | llama-1b ------| 47 |  90 | 58 | $0.10 |


thank you so much! i suffered with this, and now i never will again!


Hi! I think this is a pretty good example:

https://www.atredis.com/blog/2024/6/3/how-to-train-your-larg...


I am thinking to fine-tune it to recognize better my handwriting. It already works quite well by default, but my writing is just horrible, so it got trouble sometimes.


This whole dataset needs to be downloadable, instead of being behind their UI..


> When the system rewards cheating, the rational choice is to cheat—or be disadvantaged.

Doesn't the current president of the U.S. and indeed his posse sorta of espouse this when you look at their backgrounds? This feels like a bigger cultural issue around what the advantaged folks have been doing all along


This has been endemic for a long time. I’ve always known folk who game the system, regardless of politics or demographics

The change I feel is that nobody even cares to be honorable any longer. There is no benefit, even culturally. As the article says, you’d have to be stupid not to do it. I’ve always tried to be honest idk

But laws don’t matter anymore. There is no shaming bad actors. It’s all blatantly out there and no consequences have been doled out so here we are.


Could Q.ai be commercializing the AlterEgo tech coming out of MIT Lab? i.e. "detects faint neuromuscular signals in the face and throat when a person internally verbalizes words"

Yep, looks like that is it. Recent patent from one of the founders: https://scholar.google.com/citations?view_op=view_citation&h...


If this works well, then I could finally see that AI wearable pins could be socially feasible. IMO speaking aloud in public to AI doesn't seem like something which will work but it is also what OpenAI is apparently investing a lot into with their hardware ambition with Jony Ive [0].

[0] https://www.bloomberg.com/news/articles/2025-05-21/openai-to...


Yeah...

Pardon the AI crap, but:

> ...in most people, when they "talk to themselves" in their mind (inner speech or internal monologue), there is typically subtle, miniature activation of the voice-related muscles — especially in the larynx (vocal cords/folds), tongue, lips, and sometimes jaw or chin area. These movements are usually extremely small — often called subvocal or sub-articulatory activity — and almost nobody can feel or see them without sensitive equipment. They do not produce any audible sound (no air is pushed through to vibrate the vocal folds enough for sound). Key evidence comes from decades of research using electromyography (EMG), which records tiny electrical signals from muscles: EMG studies consistently show increased activity in laryngeal (voice box) muscles, tongue, and lip/chin areas during inner speech, silent reading, mental arithmetic, thinking in words, or other verbal thinking tasks

So, how long until my Airpods can read my mind?


> So, how long until my Airpods can read my mind?

Or explode in your ear


I'd wonder if you'd ever consider putting up a downloadable mirror of their full-text search db?


The fact that we can't just spin up a Claude code on our iPhones and have it program and run the end result right there in iOS should be chargeable offense by apple (and Android). Looking forward to the day that this capability exists.


Since this is a web app, you kind of can do it today using web tools, but I know what you mean.


Have you seen any good open source projects using llms to do the scutwork for this kind of PKMs?


No, but I haven't been following the space. (I suspect that with Claude Code-level coding agents, you should be able to do something amazing that thoroughly obsoletes Obsidian/Roam/org-mode, but I don't actually know of anything.)

I've been focused on creative writing, with poetry as my test case, to see what the bottlenecks are to truly amplifying myself through LLMs (as opposed to helping my boss automate away my job or spamming the Internet more efficiently).

I find that frontier LLMs are now there and now I can prompt for genuinely good poetry with LLMs. See https://hollisrobbinsanecdotal.substack.com/p/llm-poetry-and... / https://gwern.net/fiction/lab-animals and https://gwern.net/blog/2025/better-llm-writing

So maybe this year I can turn some attention back to PKMs and Quantified Self stuff...


I haven't tried using agents to make a full editor, but Claude Code and Gemini CLI are actually quite good at writing Obsidian plugins, or modifying existing ones. You can start with an existing one that's 90% of what you want (which tends to be the case with note-taking/PKM systems: people are so idiosyncratic that solutions built by others almost work, but not quite) and tweak it to be exactly right for you.

My own Obsidian setup has improved quite a bit in the last couple months because I can just ask Claude to change one or two things about plugins I got from the store.


Writing or tweaking plugins is great, but it's not a paradigm shift (and risks a lot more toil because now you have to be your own PM or deal with patches/merges, on top of being a reference librarian and copyeditor etc). I feel like if you have a quasi-superintelligence in a box which can run your PKM for you, and you were designing from the ground up with this in mind, that Claude Code is only going to et much better & cheaper, you would not be settling for 'write or modify an Obsidian plugin'. You would get something much different. But 'write a plugin' is basically at 'horseless carriage' level for me.

What I have in mind is something far more radical. There's an idea I am calling 'log-only writing' where you stop editing or rearranging your notes at all, and you switch to pure note taking and stream of conscious braindumping, and you simply have the LLM 'compile' your entire history down into whatever specific artifact you need on demand - whether that's a web of flashcards or a blog post or a long essay or whatever. See https://gwern.net/blog/2024/rss + https://gwern.net/nenex , combined with the LLM reasoning and brainstorming 'offline' using the prompts illustrated by my poems.


That's fair, I guess when I hear "radical overhaul" when discussing PKMs I immediately start worrying about the overload and burnout that doomed my first attempts at Obsidian (see my sibling comment), whereas right now I have a system that works very well for me, especially now that I can just ask Claude to scan the whole directory if I want to ask it questions. But if you do come up with some new blue-sky vision for PKMs, I'd love to at least take a look.


This is the way. If you symlink the .claude directory (so Obsidian can see the files) then you can also super easily add and manage claude skills.

I've spent 20 years living in the terminal, but with claude code I'm more and more drafting markdown specs, organizing context, building custom views / plugins / etc. Obsidian is a great substrate for developing personal software.


Same reasons there is no competition to facebook either (even though google tried and failed)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: