Hacker Newsnew | past | comments | ask | show | jobs | submit | folex's commentslogin

> The executables in our benchmark often have hundreds or thousands of functions — while the backdoors are tiny, often just a dozen lines buried deep within. Finding them requires strategic thinking: identifying critical paths like network parsers or user input handlers and ignoring the noise.

Perhaps it would make sense to provide LLMs with some strategy guides written in .md files.


Depends what your research question is, but it's very easy to spoil your experiment.

Let's say you tell it that there might be small backdoors. You've now primed the LLM to search that way (even using "may"). You passed information about the test to test taker!

So we have a new variable! Is the success only due to the hint? How robust is that prompt? Does subtle wording dramatically change output? Does "may", "does", "can", "might" work but "May", "cann", or anything else fail? Have you the promoter unintentionally conveyed something important about the test?

I'm sure you can prompt engineer your way you greater success but by doing so you also greatly expand the complexity of the experiment and consequently make your results far less robust.

Experimental design is incredibly difficult due to all the subtleties. It's a thing most people frequently fail at (including scientists) and even more frequently fool themselves into believing stronger claims than the experiment can yield.

And before anyone says "but humans", yeah, same complexity applies. It's actually why human experimentation is harder than a lot of other things. There's just far more noise in the system.

But could you get success? Certainly. I mean you could tell it exactly where the backdoors are. But that's not useful. So now you got to decide where that line is and certainly others won't agree.


That's what I thought of too. Given their task formulation (they basically said - "check these binaries with these tools at your disposal" - and that's it!) their results are already super impressive. With a proper guidance and professional oversight it's a tremendous force multiplier.

We are in this super weird space where the comparable tasks are one-shot, e.g. "make me a to-do app" or "check these binaries", but any real work is multi-turn and dynamically structured.

But when we're trying to share results, "a talented engineer sat with the thread and wrote tests/docs/harnesses to guide the model" is less impressive than "we asked it and it figured it out," even though the latter is how real work will happen.

It creates this perverse scenario (which is no one's fault!) where we talk about one-shot performance but one-shot performance is useful in exactly 0 interesting cases.


Something I found useful is to "just figure it out" the first part (usually discovery, or library testing, new cli testing, repo understanding, etc.) and then distill it into "learnings" that I can place in agents.md or relevant skills. So you get the speed of "just prompt it" and the repeatability of having it already worked in this area. You also get more insight into what tasks work today, and at what effort level.

Sometimes it feels like it's not dissimilar to spending 4 hours to automate a 10 minute task that I thought I'll need forever but ended up just using it once in the past 5 months. But sometimes I unlock something that saves a huge amount of time, and can be reused in many steps of other projects.


That’s hard. Sometimes you will do that and find it prompts the model into “strategy talk” where it deploys the words and frame you use in your .md files but doesn’t actually do the strategy.

Even where it works, it is quite hard to specify human strategic thinking in a way that an AI will follow.


this is exactly how I work with cursor

except that I put notes to plan document in a single message like:

   > plan quote
   my note
   > plan quote
   my note
otherwise, I'm not sure how to guarantee that ai won't confuse my notes with its own plan.

one new thing for me is to review the todo list, I was always relying on auto generated todo list


> static types often reduce to a bunch of optionals, forcing you to null check every field

On one end, you write / generate / assume a deserialisator that checks whether incoming data satisfies all required invariants, eg all fields are present. On the other end, you specify a type that has all the required fields in required format.

If deserialisation fails to satisfy type requirements, it produces an error which you can handle by eg falling back to a different type, rejecting operation or re-requesting data.

If deserialisation doesn't fail – hooray, now you don't have to worry about uncertainty.

The important thing here is that uncertainty is contained in a very specific place. It's an uncertainty barrier, if you wish: before it there's raw data, after it it's either an error or valid data.

If you don't have a strict barrier like that – every place in the program has to deal with uncertainty.

So it's not necessarily about dynamic / static. It's about being able to set barriers that narrow down uncertainty, and growing number of assumptions. The good thing about ergonomic typing system is that it allows you to offload these assumptions from your mind by encoding them in the types and let compiler worry about it.

It's basically automatization of assumptions book keeping.


> ends up being the same checks you would be doing with a dynamic language

Sure thing. Unless dev forgets to do (some of) these checks, or some code downstream changes and upstream checks become gibberish or insufficient.


I know everyone says that this is a huge issue, and I am sure you can point to an example, but I haven’t found that types prevented a lot of issues like this any better than something like Erlang’s assertion-based system.


When you say "any better than" are you referring to the runtive vs comptime difference?


Where does the stereotype 'thesaurus = synonyms + antonyms' come from?

I'm not a native english speaker, and I never heard that idea besides in, I'd guess, Friends TV show.

I've used thesauruses since my childhood for exactly the task of looking up meanings, explanations, perhaps some etymology baked in.

For English, I always use WordNet, it is quite good and works offline on Android.

For my basic level of Chinese, Outliers dictionaries are so far the best I have found, but that's mainly due to my heavy reliance on the etymology provided there.

Well, I guess I got carried away a bit. Back to my question, where thesaurus=synonyms+antonyms comes from?


The usage of "thesaurus" in English for a kind of book dates back to the first one by Peter Mark Roget in 1852, which was indeed synonyms and antonyms: https://en.wikipedia.org/wiki/Roget%27s_Thesaurus see the Project Gutenberg link mentioned in another comment: https://www.gutenberg.org/cache/epub/10681/pg10681-images.ht... (or indeed, just read the posted article here).

This is still the primary meaning of "thesaurus" in English, and contrasted with "dictionary": https://en.wikipedia.org/wiki/Thesaurus

It's very unusual for a thesaurus to contain meanings (beyond the category head/name) and etymology, let alone explanation. Such things are usually found in a dictionary instead.

So it's more a question for you: where did your unusual idea of "thesaurus" come from? As one of your examples you mention dictionaries, so that's especially confusing.


The word comes from Greek, θησαυρός - "a store, treasure, storehouse, treasury".

> The usage of "thesaurus" in English for a kind of book dates back to the first one by Peter Mark Roget in 1852

To nitpick, though of interest: The usage meaning a book of words organized by their senses indeed dates to Roget's use in 1852, as the parent comment says. An earlier usage is more generally a 'treasury' of knowledge (in book form): there was a Thesaurus Linguæ Romanæ et Britannicæ ... in 1565, a Thesaurus Linguæ Latinæ compendiarius ... in 1736, and John Stuart Mill in 1840 wrote about "A thesaurus of commonplaces for the discussion of questions."

Source: Oxford English Dictionary


I did in fact look at Etymonline for "thesaurus" before I made my comment and therefore chose my words carefully, but not clearly enough. Yes there were books before Roget's that contained the word “thesaurus” (treasury) in their title, including at least one dictionary. But none of these earlier books caused the word “thesaurus” to start to mean a particular kind of book, which it does in English following Roget — and that kind of book is precisely one that contains synonyms and antonyms, and usually does not contain meanings or etymology. So the comment by folex remains weird — for one thing, it uses “stereotype” where “meaning” would be more appropriate.


> Yes there were books before Roget's that contained the word “thesaurus” (treasury) in their title, including at least one dictionary. But none of these earlier books caused the word “thesaurus” to start to mean a particular kind of book ...

FWIW, OED has a separate sense for, "A ‘treasury’ or ‘storehouse’ of knowledge, as a dictionary, encyclopædia, or the like." Is that a particular kind of book or a general concept? I don't know.

Its usage extends past Roget - e.g., (1862) "In a complete thesaurus of any language, the etymology of every word should exhibit both its philology and its linguistics." and (1906) "This work is one of five thesauri published under the auspices of Kang Hsi, the second Emperor of the present dynasty."

And to be complete, a newer usage dates to 1957, "A classified list of terms, esp. key-words, in a particular field, for use in indexing and information retrieval.".

> I did in fact look at Etymonline

Etymonline is the admiral work of one person. If you can get access to the OED (and if you love etymology, etc., it's essential), you'll generally find much more, and much more reliable work done by teams of professional over ~150 years.

Etymonline's About page is incredible - I'm going to submit it:

https://www.etymonline.com/columns/post/bio

> the comment by folex remains weird — for one thing, it uses “stereotype” where “meaning” would be more appropriate.

Not weird at all - folex says they don't speak English natively. Stereotype makes sense in a way, but is not the word a native speaker would choose. I'm the same in some languages.

folex's question is maybe the most interesting part of the discussion.


I'd assume from the earlier meaning of "thesaurus" which comes from "treasury," or as it exists in my mind, treasure chest.

> The meaning "encyclopedia filled with information" is from 1840, but it existed earlier as thesaurarie (1590s), used as a title by some early dictionary compilers, on the notion of thesaurus verborum "a treasury of words." The meaning "collection of words arranged according to sense" is attested from 1852 in Roget's title.

from: https://www.etymonline.com/word/thesaurus


I'm not sure stereotype is the correct word here. But even setting that aside, a thesaurus being a referential work containing words grouped by similarity is the CONVENTIONAL definition.

Everyone of my friends and family had one growing up. It wasn't completely uncommon as a young child to glaze your eyes "beautiful mind style" to suss out repetitious or excessive duplicate word usage in your hastily prepared 5-paragraph MLA format essay and then run it through the nearest Merriam-Webster thesaurus.

https://youtu.be/XAD13c3UkS0?t=49


From the writer's need to find a more suitable word than the ones he knows.



I like that the URL is basically the answer to the question.


I wouldn't describe it as a comedy as well.

However, I'd recommend Murderbot series, it is full of humour and shares atmosphere of Bobiverse and this personal approach to characters, as well. Highly recommend.


I am not a native speaker of English so maybe that's why. Typically anything that has lots of humor and makes the reader laugh or be amused frequently -- I think of as comedy. Is there a more nuanced distinction to what is usually called a comedy in literature / movies?

I tried to think of some other examples. Trevor Noah's biography 'Born A Crime' came to mind. I would not explicitly describe it as a comedy myself - because a 'biography' is descriptive enough as well as non-fiction by definition so any humor is mostly not made up. If it were not a biography through -- it would probably go into the comedy bucket in my mind. Maybe I am just mis applying terms here.


Yes, Bobiverse and Murderbot are very close in spirit, and if you like one you are very likely to enjoy the other. Also both have great audio narration.


/05


I can't really wait for tooling to visualise complex technical concepts into nice animations and static images.

Imagine describing how some system works, what it consists of, and get architecture images + process animations.


This is what I am missing as well. I'm a pre-sales technical architect that keeps looking for use-cases where I am able to leverage this technology. 75% of my work is in draw.io and PowerPoint.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: