techcam's comments

techcam · 2026-03-18T13:31:01 1773840661

Happy to explain how the scoring works since that’s the obvious first question.

The core idea is:

Safety Score = 100 − riskScore

The risk score is based on structural prompt properties that tend to correlate with failures in production systems:

- instruction hierarchy ambiguity - conflicting directives (system vs user) - missing output constraints - unconstrained response scope - token cost / context pressure

Each factor contributes a weighted amount to the total risk score.

It’s not trying to predict exact model behavior — that’s not possible statically.

The goal is closer to a linter: flagging prompt structures that are more likely to break (injection, hallucination drift, ignored constraints, etc).

There’s also a lightweight pattern registry. If a prompt matches structural patterns seen in real jailbreak/injection cases (e.g. authority ambiguity), the score increases.

One thing that surprised me while building it: instruction hierarchy ambiguity caused more real-world failures than obvious injection patterns.

The CLI runs locally — no prompts are sent anywhere.

If you want to try it:

npm install -g @camj78/costguardai costguardai analyze your-prompt.txt

Curious what failure modes others here have seen in production prompts.

techcam · 2026-03-17T18:07:03 1773770823

The tricky part is that prompts can look “correct” but still behave unpredictably depending on phrasing.

techcam · 2026-03-17T18:05:17 1773770717

We ran into something similar with API costs — small changes in behavior can have surprisingly large downstream effects.

techcam · 2026-03-17T18:03:05 1773770585

This resonates — most of the hard problems show up after you ship, not before.

techcam · 2026-03-17T18:01:27 1773770487

Feels like we have great tooling for code, but prompts are still mostly trial-and-error. Curious how people are validating them today.

techcam · 2026-03-17T17:59:38 1773770378

I’ve been noticing the same — a lot of failures aren’t obvious “jailbreaks,” they’re just subtle prompt structure issues that only show up in production.