Hacker Newsnew | past | comments | ask | show | jobs | submit | unignorant's commentslogin

I had the same reaction, but the article is not AI-generated according to pangram, which I've generally found reliable. I wonder if LLM turns of phrase and even thought patterns are creeping into normal human thought.

Or, stay with me here, the LLMs were trained on how we, statistically, write.

There are typical LLM voices and styles, just like human writers have differentiated voices and styles. And some common elements of the typical LLM style are distinct from humans I've previously read.

I recognize this. It's also the case that I suspect that I've read more about how annoying suspected LLM output is to read than I have read LLM output. The slop is, to me, an incredibly unwelcome contribution of humans that don't enjoy the craft but complaining about it is equally stuck in and further exacerbating the froth rather than distilling down to the substance. That is it keeps the focus on the surface rather than on what the core content is and whether it has value.

LLM writing doesn’t have substance, it’s statistically likely text generated from some bullet points, without intention or style.

When you say "we" you're talking about Twitter, right?

I used that once, during a conference about 6 years ago and never again since. My use of "we" references humanity.

In that case, I feel that "we" needs some correction. Because these slopcannons get their ammunition from scraping the gargantuan septic tanks of the Internet, like Twitter and Facebook and whatever 600M Chinese people are using that I've never heard of.

Comparatively very little of that "humanity" corpus is coming from Shakespeare or Swift or Douglas Adams, much as we might prefer if it did.


I never claimed it was trained on the notables of humanity. On the other hand, to your adjacent point (to twist it) that humanity needs some correction, I whole heartedly agree.

Anytime I see “this is not just x, it’s y” i can almost guarantee with high degree of confidence that slop was used.

As someone from outside the Anglophone cultural sphere, when I first learned to write in English, the kind of writing that AI now often produces was taught to me as “formal" writing.

But these days, when I write in that formal style, people sometimes say it sounds like AI. That has been a difficult and frustrating point for me.

I still find the subtle difference hard to understand.


I was raised and educated well inside the Anglosphere (USA) and was also taught to write formally in that way.

Do the people who say you sound like AI give you any specifics?

Also, if you don't mind, what was your English education like? I understand that quite a few Americans work in South Korea as teachers but I have no details about how that manifests.


That used to happen to me more often. When I first came to HN, and even now if I am not careful, my comments can get flagged. Also, when I translate from Korean using DeepL and paste the result, people often say it sounds flagged, awkward, or unnatural. I studied English more seriously in graduate school, although I dropped out. In Korea, there are quite a few Americans who teach English. Public schools often have native English-speaking instructors, but in my case I learned English more seriously at graduate school, and universities also make students study English almost semi-compulsorily.

In Seoul, there are probably many teachers who mainly teach middle and high school students, but a lot of that is through private education rather than the public school system.


I'm still pissed that I had to practice removing that from my writing habits. I liked that device, dammit!

It's not just AI-generated, it's also slop!

It's worth mentioning pangram is more confident in it's positive detections than it's negative ones, as stated by the founder in an interview on the most recent ThursdAI episode

I think its bidirectional. We change our writing based on what we see (AI generated content on the internet) and AI will learn based on what we write.

This isn't my project, but I shared it here because it has a few important ideas I've been thinking about in my own work. Effect type systems in particular are a really good fit for LLMs because they allow you to reason very precisely about a program's capabilities before runtime (basically, using the type system for capability proofs). This helps you trust agent-created code (for example, you know it can't do IO), or, if the code does require certain capabilities, run it in a sandbox (e.g., mock network or filesystem). This kind of language design also provides a safer foundation for complex meta-systems of agents-that-create-agents, depending on how the runtime is implemented, though Vera may be somewhat limited in that particular respect.

The major design decision I'm a little skeptical about is removing variable names; it would be interesting to see empirical data on that as it seems a bit unintuitive. I would expect almost the opposite, that variable names give LLMs some useful local semantics.


You're looking for Scala… ;-)

https://news.ycombinator.com/item?id=47957121


I agree, the more likely psychology of the Greg character is that he doesn't understand the way he presents himself in the pictures damns his surface level framing. You can really go quite far with more sophisticated versions of this technique in fiction -- Ishiguro's Remains of the Day is my favorite example!


Like half the posters on AmIAnAsshole-type forums, Greg doesn't realize he's an ass. But he's also a kid. Adults should know better.


These days it's almost trivial to design a binder against a target of interest with computation alone (tools like boltzgen, many others). While that's not the main bottleneck to drug development (imo you are correct about the main bottlenecks), it's still a huge change from the state of technology even 1 or 2 years ago, where finding that same binder could take months or years, and generally with a lot more resources thrown at the problem. These kinds of computational tools only started working really well quite recently (e.g., high enough hit rates for small scale screening where you just order a few designs, good Kd, target specificity out of the box).

So both things can be true: the more important bottlenecks remain, but progress on discovery work has been very exciting.


As noted, I agree on the great strides made in the protein space. However, the over saturation and redundancy in tools and products in this space should make it pretty obvious that selling API calls and compute time for protein binding, annd related tasks, isn’t a viable business beyond the short term.


Here are my notes and guesses on the stories in case people here find it interesting. Like some others in the blog post comments I got 6/8 right:

1.) probably human, low on style but a solid twist (CORRECT) 2.) interesting imagery but some continuity issues, maybe AI (INCORRECT) 3.) more a scene than a story, highly confident is AI given style (CORRECT) 4.) style could go either way, maybe human given some successful characterization (INCORRECT) 5.) I like the style but it's probably AI, the metaphors are too dense and very minor continuity errors (CORRECT) 6.) some genuinely funny stuff and good world building, almost certainly human (CORRECT) 7.) probably AI prompted to go for humor, some minor continuity issues (CORRECT) 8.) nicely subverted expectations, probably human (CORRECT)

My personal ranking for scores (again blind to author) was:

6 (human); 8 (human); 4 (AI); 1 (human) and 5 (AI) -- tied; 2 (human); 3 and 7 (AI) -- tied

So for me the two best stories were human and the two worst were AI. That said, I read a lot of flash fiction, and none of these stories really approached good flash imo. I've also done some of my own experiments, and AI can do much better than what is posted above for flash if given more sophisticated prompting.


I was surprised at the result, and even more surprised when I read that one of the authors who did the test got 4 out of 5 wrong, and rated 2 of the AI stories highly.

Looking at my notes, I got one wrong (story 5, dunno what the "name" was supposed to be, assumed that the "name" is something widely-known in culture that brings about the end times, a something that I didn't know about, and so marked it as Human because of a supposed reference to a shared cultural knowledge), and all the AI written stories I rated at either 1 or two points, with the lowest Human-written story getting 3 and the highest getting 5 (Story 1).

It makes me wonder if we are over-estimating the skill an author has when reading based on their demonstrated skill when writing.

IOW, according to my notes/performance, the AI stories were easy to spot and correlated with low scores anyway, while the author(s), who actually produced high-rated stuff for me, rated my low-rated stuff as high.


The only one I was fairly sure was human was #6, and that was the only one I kinda enjoyed. In any case, as someone who reads a good deal, I agree. I didn't think any of the stories was particularly great (not enough to bother ranking them, beyond favourite) so I don't care all that much about the result.

> AI can do much better than what is posted above for flash if given more sophisticated prompting.

How sophisticated, compared to just writing the thing yourself?


In another reply I gave an example of something you can do: https://news.ycombinator.com/item?id=44937774

I enjoy writing so a system like this would never replace that for me. But for someone who doesn't enjoy writing (or maybe can't generate work that meets their bar in the Ira Glass sense of taste) I think this kind of setup works okay for generating flash even with today's models.


Could you expand on your point re more sophisticated prompting?

I have found it hard to replicate high quality human-written prose and was a bit surprised by the results of this test. To me, AI fiction (and most AI writing in general) has a certain “smell” that becomes obvious after enough exposure to it. And yet I scored worse than you did on the test, so what do I know…


For flash you can get much better results by asking the system to first generate a detailed scaffold. Here's an example of some metadata you might try to generate before actually writing the story: genres the story should fit into; pov of the story; high level structure of the story; list of characters in the story along with significant details; themes and topics present in the story; detailed style notes

From there you have a second prompt to generate a story that follows those details. You can also generate many candidates and have another model instance rate the stories based on both general literary criteria and how well the fit the prompt, then you only read the best.

This has produced some work I've been reasonably impressed by, though it's not at the level of the best human flash writers.

Also, one easy way to get stuff that completely avoids the "smell" you're talking about by giving specific guidance on style and perspective (e.g., GPT-5 Thinking can do "literary stream-of-consciousness 1st person teenage perspective" reasonably well and will not sound at all like typical model writing).


I had similar results, and story 4 is so trope heavy I wonder if it’s just an amalgamation of similar stories. The human stories all felt original, where none of the AI ones did.


I'm not sure I agree that the human stories felt original. I was pretty unimpressed with all of the stories except maybe 6, and even that one dealt in some common tropes. 5 had fewer tropes than 6 (and maybe as a result of that received the highest average scores from his readers) but I could tell from the style it was AI


I really enjoyed this article but the claim of no literary fiction making the Publishers Weekly yearly top 10 lists since 2001 isn't really true:

https://en.wikipedia.org/wiki/Publishers_Weekly_list_of_best...

https://en.wikipedia.org/wiki/Publishers_Weekly_list_of_best...

It is true that there isn't that much literary stuff that breaks through, and the stuff that does is usually somewhat crossover (e.g., All the Light We Cannot See in 2015 or Song of Achilles in 2021) but it exists. These two books are shelved under literary codes (though also historical). Song of Achilles in particular is beautifully written and a personal favorite of mine, at least among books published in recent years.

Then there are other works like Little Fires Everywhere and The Midnight Library that I might not consider super literary but nonetheless are also often considered so by book shops or libraries (e.g., https://lightsailed.com/catalog/book/the-midnight-library-a-...; the lit fic code is FIC019000).

I was really surprised that Ferrante's Neapolitan series, the best example (I would have thought) of recent work with both high literary acclaim and popular appeal, did not actually make the top 10 list for any year.


Yeah, and looking through the lists makes one suspect that there's a problem of incommensurate measurements... There's a lot of 'very hungry caterpillar' in the recent lists, but I'm unsure whether children's books were even in the running in the 1960's. Or else there's been a revolution in buying books for children since the 60's, which, honestly, I wouldn't be sad about...


yeah, it seems likely the underlying task here (one reasoning step away) was: replace as many fp32 operations as possible in this kernel with fp16. i'm not sure exactly how challenging a port like that is, but intuitively seems a bit less impressive

maybe this intuition is wrong but would be great for the work to address it explicitly if so!


Only seems to have done that in a couple places, like the MatMul. The softmax kernel (https://github.com/ScalingIntelligence/good-kernels/blob/mai...) seem to be entirely bog-standard, and the layernorm kernels are only slightly more interesting.


I looked at the softmax kernel and the cast that it does from a float* to a float4* is extremely brittle -- it's trivial to break by offsetting the input slightly.

Very likely a kernel for a standard library could not employ such a trick that relies on alignment of input pointers. Certainly not without a fallback.


I do a lot of ML work too and recently gave NixOS a try. It's actually not too hard to just use conda/miniconda/micromamba to manage python environments as you would on any other linux system with just a few lines of configuration. Pretty much just add micromamba to your configuration.nix plus a few lines of config for nix-ld. Many other python/ML projects are setup to use docker, and that's another easy option.

I don't have the time or desire to switch all my python/ML work to more conventional Nix, and haven't really had any issues so far.


This technique doesn't actually use RL at all! There’s no policy-gradient training, value function, or self-play RL loop like in AlphaZero/AlphaTensor/AlphaDev.

As far as I can read, the weights of the LLM are not modified. They do some kind of candidate selection via evolutionary algorithms for the LLM prompt, which the LLM then remixes. This process then iterates like a typical evolutionary algorithm.


Thanks for sharing this! I occasionally use google translate and/or GPT4 for similar purposes, but your tool makes the workflow a bit simpler.

I've found creative writing in a target language is great for learning.


Happy to share, thanks!

I also use GPT-4 for explaining the meaning of sentences in more detail (as in JimDabell’s comment). Often my questions are like “how would a native speaker say this colloquially” - I’ve found it really valuable to be able to have a back-and-forth on why something works the way it does


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: