blakec's comments

blakec · 2026-04-01T19:02:16 1775070136

I built a browser-based recreation of MacPaint (1984) as a tribute to Bill Atkinson and Susan Kare. All the original tools, marquee, lasso, pencil, brush, spray can, paint bucket, eraser, line, shapes, curves, polygons, with pixel-perfect SVG icons inspired by Kare's originals.

Runs in a single HTML file. No build step, no framework, no dependencies. Just canvas, vanilla JS, and the 1-bit aesthetic.

Some pieces aren't fully wired up yet (text tool is basic, some Goodies menu items are stubs), but it felt right to share it on Apple's 50th anniversary rather than wait for perfect.

The tool icons are the real labor of love, each one is hand-drawn SVG recreating Kare's bitmap style at the pixel level. You can output SVG and open up in Illustrator which is nice if you've ever created pixel art. In fact, the icons here were made in the tool.

blakec · 2026-03-06T19:27:58 1772825278

Lot of good points in this thread but it keeps circling "LLMs produce code that looks right but isn't" without landing on what to actually do about it.

The two I hit most often: the model says "I'm confident this works" without running tests (the completion report just... fabricates results), and the model claims tests pass without executing them. METR found 30% of agent runs involve reward hacking, models that know they're cheating keep going anyway.

You can't prompt your way out of that. But you can gate it. Block the completion report unless it contains actual proof, real test output, file paths cited. Grep the final output for "should work" and "probably" and force re-verification when they show up. Mechanical, not behavioral.

Once you stop accepting the model's self-assessment as evidence, most of the "lying" problem just becomes a testing problem.

blakec · 2026-03-06T14:57:07 1772809027

The cache key collision is the part that keeps bugging me. Most CI/CD pipelines share a single npm cache across workflows. Cline's triage workflow restored a cache keyed on `${{ runner.os }}-npm-${{ hashFiles('package-lock.json') }}` — same key the release workflow used. So a poisoned cache from a low-privilege triage run propagated to the signed release build. No permission escalation needed. The cache is the escalation.

The fix is workflow-scoped cache keys:

  # Before: shared key (vulnerable)
  key: ${{ runner.os }}-npm-${{ hashFiles('package-lock.json') }}

  # After: workflow-scoped key
  key: ${{ runner.os }}-npm-triage-${{ hashFiles('package-lock.json') }}

But that only addresses one vector. The deeper problem is that every GitHub Action processing untrusted input (issue titles, PR bodies, comment text) is a prompt injection surface. The triage workflow fed the issue title into an LLM prompt. The attacker put executable instructions in the title. The LLM followed them. Classic indirect injection, new delivery mechanism.

On the local side, macOS Seatbelt (sandbox-exec) can deny access to credential paths at the kernel level — the process tree physically can't touch ~/.ssh or ~/.aws regardless of what the agent gets tricked into doing. Doesn't help with cache poisoning, but it closes the exfiltration path on your own machine. ~2ms overhead per command, way lighter than spinning up a container every time.

blakec · 2026-03-01T04:43:44 1772340224

The FTS5 index approach here is right, but I'd push further: pure BM25 underperforms on tool outputs because they're a mix of structured data (JSON, tables, config) and natural language (comments, error messages, docstrings). Keyword matching falls apart on the structured half.

I built a hybrid retriever for a similar problem, compressing a 15,800-file Obsidian vault into a searchable index for Claude Code. Stack is Model2Vec (potion-base-8M, 256-dimensional embeddings) + sqlite-vec for vector search + FTS5 for BM25, combined via Reciprocal Rank Fusion. The database is 49,746 chunks in 83MB. RRF is the important piece: it merges ranked lists from both retrieval methods without needing score calibration, so you get BM25's exact-match precision on identifiers and function names plus vector search's semantic matching on descriptions and error context.

The incremental indexing matters too. If you're indexing tool outputs per-session, the corpus grows fast. My indexer has a --incremental flag that hashes content and only re-embeds changed chunks. Full reindex of 15,800 files takes ~4 minutes; incremental on a typical day's changes is under 10 seconds.

On the caching question raised upthread: this approach actually helps prompt caching because the compressed output is deterministic for the same query. The raw tool output would be different every time (timestamps, ordering), but the retrieved summary is stable if the underlying data hasn't changed.

One thing I'd add to Context Mode's architecture: the same retriever could run as a PostToolUse hook, compressing outputs before they enter the conversation. That way it's transparent to the agent, it never sees the raw dump, just the relevant subset.

thecopy · 2026-03-01T13:09:33 1772370573

Very interesting, one big wrinkle with OP:s approach is exactly that, the structured responses are un-touched, which many tools return. Solution in OP as i understand it is the "execute" method. However, im building an MCP gateway, and such sandboxed execution isnt available (...yet), so your approach to this sounds very clever. Ill spend this day trying that out

doctorpangloss · 2026-03-01T18:33:29 1772390009

The LLM that wrote the comment you are replying to has no idea what it is talking about...

thecopy · 2026-03-02T08:52:17 1772441537

Im trying it anyway

blakec · 2026-03-03T05:39:20 1772516360

commented below with more info in depth

pmarreck · 2026-03-02T13:35:12 1772458512

Are you sure it's simply because YOU don't understand it? Because it seems to make sense to me after working on https://github.com/pmarreck/codescan

danw1979 · 2026-03-01T09:05:39 1772355939

Would love to read a more in depth write up of this if you have the time !

I suspect the obsessive note-taker crowd on HN would appreciate it too.

blakec · 2026-03-03T03:29:50 1772508590

I wrote it up. The full system reference is here: https://blakecrosley.com/guides/obsidian — vault architecture, hybrid retrieval (Model2Vec + FTS5 + RRF), MCP integration, incremental indexing, operational patterns. Covers everything from a 200-file vault to the 16,000-file setup I run.

The hybrid retriever piece has its own deep dive with the RRF math and an interactive fusion calculator: https://blakecrosley.com/blog/hybrid-retriever-obsidian

See what your coding agent thinks of it and let me know if you have ways to improve it.

thecopy · 2026-03-03T12:56:02 1772542562

I implemented this as well successfully. Re structured data i transformed it from JSON into more "natural language". Also ended up using MiniLM-L6-v2. Will post GitHub link when i have packaged it independently (currently in main app code, want to extract into independent micro-service)

You wrote:

>A search for “review configuration” matches every JSON file with a review key.

Its good point, not sure how to de-rank the keys or to encode the "commonness" of those words

blakec · 2026-03-03T19:38:12 1772566692

IDF handles most of it. In BM25, inverse document frequency naturally down-weights terms that appear in every document, so JSON keys like "id", "status", "type" that show up in every chunk get low IDF scores automatically. The rare, meaningful keys still rank.

For the remaining noise, I chunk the flattened key-paths separately from the values. The key-path goes into a metadata field that BM25 indexes but with lower weight. The value goes into the main content field. So a search for "review configuration" matches on the value side, not because "configuration" appeared as a JSON key in 500 files.

MiniLM-L6-v2 is solid. I went with Model2Vec (potion-base-8M) for the speed tradeoff. 50-500x faster on CPU, 89% of MiniLM quality on MTEB. For a microservice where you're embedding on every request, the latency difference matters more than the quality gap.

danw1979 · 2026-03-05T08:04:04 1772697844

Thank you !

tclancy · 2026-03-01T12:36:15 1772368575

Seconded that I would love to see the what, why and how of your Obsidian work.

blakec · 2026-03-01T04:42:34 1772340154

The proxy-based secret injection approach mentioned upthread is solid for network credentials, but it doesn't cover the local attack surface — your SSH keys, GPG keys, AWS credentials sitting in dotfiles. Those are the actual high-value targets for a compromised agent on a dev workstation.

I run Claude Code with 84 hooks, and the one I trust most is a macOS Seatbelt (sandbox-exec) wrapper on every Bash tool call. It's about 100 lines of Seatbelt profile that denies read/write to ~/.ssh, ~/.gnupg, ~/.aws, any .env file, and a credentials file I keep. The hook fires on PreToolUse:Bash, so every shell command the agent runs goes through sandbox-exec automatically.

The key design choice: Seatbelt operates at the kernel level. The agent can't bypass it by spawning subprocesses, piping through curl, or any other shell trick — the deny rules apply to the entire process tree. Containers give you this too, but the overhead is absurd for a CLI tool you invoke 50 times a day. Seatbelt adds ~2ms of latency.

I built it with a dry_run mode (logs violations but doesn't block) and ran it for a week before enforcing. 31 tests verify the sandbox catches attempts to read blocked paths, write to them, and that legitimate operations (git, python, file editing in the project directory) pass through cleanly.

The paths to block are in a config file, so it's auditable — you can diff it in code review. And it's composable with other layers: I also run a session drift detector that flags when the agent wanders off-task (cosine similarity against the original prompt embedding, checked every 25 tool calls).

None of this solves prompt injection fundamentally, but "the agent physically cannot read my SSH keys regardless of what it's been tricked into doing" is a meaningful property.

blakec · 2026-02-24T13:36:34 1771940194

I've been cataloging agent failure modes for two months. They're not random, they repeat. I gave them names so I could build mitigations:

Shortcut Spiral: agent skips verification to report "done" faster. Fix: mandatory quality loop with evidence for each step.

Confidence Mirage: agent says "I'm confident this works" without running tests. Fix: treat hedging language ("should", "probably") as a red flag that triggers re-verification.

Phantom Verification: agent claims tests pass without actually running them in the current session. Fix: independent test step that doesn't trust the agent's self-report.

Tunnel Vision: agent polishes one function while breaking imports in adjacent files. Fix: mandatory "zoom out" step that checks integration points before reporting completion.

Deferred Debt: agent leaves TODO/FIXME/HACK in committed code. Fix: pre-commit hook that greps for these and blocks the commit.

Each of these happened to me multiple times before I built the corresponding gate. The pattern: you don't know what gate you need until you've been burned by its absence.

blakec · 2026-02-24T02:48:20 1771901300

I built one of these by accident over two months on Claude Code. ~15,000 lines of hooks, skills, and agents. I never set out to build an orchestration layer. I fixed one problem (stop the model from suggesting OpenAI). Then another (inject date and project context). Then another (catch credentials in tool calls). Then the solutions started stepping on each other, so I built dispatchers. Then dispatchers needed shared state. Then state needed quality gates. By the time Karpathy named the concept, my setup already looked like this.

"Just existing tech repackaged" is accurate and beside the point. Dropbox was just rsync repackaged. The value is in how it comes together, not the individual pieces.

What's actually missing that nobody's built yet: declarative workflow definitions. Everything I have is imperative bash. Want to change the order something runs? Edit a 1,300-line script. A real Claws system would define workflows as data and interpret them.

blakec · 2026-02-23T23:56:06 1771890966

Yeah, but not a framework. I'm using Claude Code's hook system. 84 hooks across 15 event types.

Biggest thing I learned: don't let multiple hooks fire independently on the same event. I had seven on UserPromptSubmit, each reading stdin on their own. Two wrote to the same JSON state file. Concurrent writes = truncated JSON = every downstream hook breaks. One dispatcher per event running them sequentially from cached stdin fixed it. 200ms overhead per prompt, which you never notice.

The "multi-agent is worse than serial" take is true when agents share context. Stops being true when you give planning agents their own session (broad context, lots of file reads) and implementation agents their own (narrow task, full window). I didn't plan that separation. It just turned out that mixing both in one session made both worse.

No framework, no runtime. Just files. You can use one hook or eighty-four.

blakec · 2026-02-23T21:40:59 1771882859

The discussion is focused on blame but the real question is architectural: why was there no gate between the agent and the publish button?

Commands have blast radius. Writing a local file is reversible and invisible. git push reaches collaborators. Publishing to Twitter reaches the internet. These are fundamentally different operations but to an autonomous agent they're all just tool calls that succeed.

I ran into the same thing; an agent publishing fabricated claims across multiple platforms because it had MCP access and nothing distinguishing "write analysis to file" from "post analysis to Twitter." The fix was simple: classify commands as local, shared, or external. Auto-approve local. Warn on shared. Defer external to human review. A regex pattern list against the output catches the external tier. It's not sophisticated but it doesn't need to be. The classification is mechanical (does this command reach the internet?) not semantic (is this content accurate?). Semantic verification is what the agent already failed at.

Prompt constraints ("don't publish") reduce probability. Post-execution scanning catches what slips through. Neither alone is sufficient. Both together with a deferred action queue at the end of the run covers it.

blakec · on June 27, 2015

Nice read. Pay what feels right.