Normally the reason to do a conservative GC is because it makes the VM’s implementation (and potentially FFI) simpler, not because it’s faster. Also for fun of course. (There are reasons a fully-conservative design might be faster, like not having to havr a ton of scanning functions if you have a ton of disparate types, but in general my perception is that production GCs are normally moving to at least some degree, which a fully conservative one can’t be.)
Yeah. I implemented the conservative collector because it simplified development of the language's primitive functions. Allowed me to put lisp values on the C stack without worrying about them being garbage collected.
The current version of the collector also compacts the heap and moves values around. All conservatively discovered values get pinned.
One recent interesting partly-conservative, moving design I remember reading about is Whippet[1] by Andy Wingo (Guile maintainer, SpiderMonkey contributor). He’s been partly forced into it, though, by the fact that Guile’s existing external API (unlike, say, Lua’s) exposes the fact that it uses a non-moving collector (by allowing the user to keep hold of raw pointers to Scheme objects), so I’m not sure if this should serve as an inspiration or a cautionary tale.
I really like his blog! I emailed him when I published my delimited continuations article because it addresses the overlapping native/lisp stacks problem he wrote about. Sadly I don't think he's seen it.
> One way is to inform the garbage collector of the locations of all roots [...] implicitly, in the form of a side table generated by the compiler associating code locations with root locations.
Getting GCC to do things for you is fraught. Probably still possible; just... fraught.
I believe Clang was intended to be able to do this[1] and I remember seeing that stuff even back when it was a particularly spunky research project. The facility doesn’t seem to have really gone anywhere even if it technically works; I wonder why.
In general, the problem with stack maps of any kind—even the very minimal ones you need for stack unwinding—is that they’re liable to be large, slow to process, or both. Now that I’m thinking about it, I wonder if you could map the stack at runtime using those same unwinding tables. A C++ compiler does have to know where it put things so it can call their destructors, and perhaps you could make your GC root a word-sized but otherwise opaque uncopyable word-sized thingy so the compiler can’t put it in more than one place at once. (I can’t be the first to have the idea, even with how stupid it sounds and with how utterly miserable Itanium ABI unwinding is.)
I haven't measured the performance. I would like to. I'm especially interested in comparing it with the current version of the collector. It is now capable of heap compaction which will be the subject of a future article. I'm also interested in knowing how big a problem the conservative part of the collector is. The C stack depth has been minimized significantly since I got rid of the recursive evaluator. I don't think it's a significant problem but I could be wrong.
I need to implement the compiler's profiling functions in lone. Documentation on the matter seems scarce. Not even sure if those instrumentation APIs are stable. I got away with not implementing the ASAN and stack protector since they added flags that make the compiler emit trapping instructions instead of function calls. Profiling is a lot more complex.
The issue with Regex for parsing is it can't handle balanced parentheses. https://en.wikipedia.org/wiki/Regular_expression. More generally, they can't handle nested structure. Context free grammars are the most natural extension that can. It adds a substitution operator to Regex that makes it powerful enough to recognize nested structure. So, Regex would be reinvented if history was rerun, but so would Context Free Grammars. Part of the complexity in parsing is attaching semantic meaning to the parse. Regex mostly avoids this by not caring how a string matches, just if it matches or not.
Now, I do agree that LR grammars are messy. Nowadays, they have mostly fallen from favor. Instead, people use simpler parsers that work for the restricted grammars actual programming languages have.
IIRC there is some research into formalizing the type of unambiguous grammar that always uses () or [] as nesting elements, but can use Regex for lexing.
I understand what a CFG is and why Dyck's language (matching parens) is not a regular language. My point was that CFG/CFL is less motivated by a reasonable and uniquely characterising constraint - such as making memory usage independent of the size of an input string - than regex is.
Then again, you are right that CFGs are very natural. And they do admit a few easy O(n^3) parsing algorithms, like Earley and CYK.
I think your last sentence relates to Visible Pushdown Grammars. See also Operator Precedence Grammars.
Rust's discussion boards has an idea of "keyword generics" for expressing some of these concepts. The idea is that a function can be generic over const, async or some other keyworded effect. I like this description. It shows the benefits without too much theory.
Yours might go a little less into the details, but its really clear and I like the diagrams and explanation around glitch hazards. Please do follow up on your tangents if you have time.
I have commented once or twice on articles being AI generated. I don't put them when I think the writer used AI to clean up some text. I added them when there are paragraphs of meaningless or incorrect content.
Formats, name collisions or back-button breakage are tangential to the content of the article. Being AI generated isn't. And it does add to the overall HN conversation by making it easier to focus on meaningful content and not AI generated text.
Basically, if the writer didn't do a good job checking and understanding the content we shouldn't bother to either.
The fascinating paradox: there are clearly "tells" (slop-smells, like code-smells?) of LLM-generated text. We're all developing heuristics rapidly, which probably pass a Pepsi challenge 95+% of the time.
And yet: LLMs are writing entirely based on human input. Presumably there exists a great quantity of median representative text, some lowest-common denominator, of humans who write similarly to these heuristics.
(In particular: why are LLMs so fond of em-dashes, when I'm not sure I've ever seen them used in the wilds of the internet?)
An LLM was used to help polish the text, and that section probably got over-polished. The ideas and code are authentic. The article is intentionally detailed for those who want the full reasoning, but you can always skip straight to the source code.
Out of curiosity what particular part of the original text needed to be polished and why couldn’t the writer accomplish said polish without a language model?
When writing the articles in that series, the focus was more on getting the technical ideas and details right, not on spelling, grammar and text flow in which LLMs excel. That specific section "Why this maps to Genetic Algorithms?" makes the point that the fit of Genetic Algorithms to state space exploration is not a coincidence, and argues that the evolutionary process itself is a state space exploration algorithm that allows a given species to advance further down life's timeline. But thanks for the question, I do agree that as LLM generated text becomes more ubiquitous everywhere, we all do appreciate more our own human writing style even for the highly technical text where the focus is on the technical ideas and not so much on the presentation language itself.
By the definition of a cryptographically secure PRNG, no. They, with overwhelming probability, produce results indistinguishable from truly random numbers no matter what procedure you use to tell them apart.
I worked on a team deploying a service to European Sovereign Cloud (ESC). Disclaimer - I am a low level SDE and all opinions are my own.
AWS has set up proper boundaries between ESC and global AWS. Since I'm based out of the US I can't see anything going on in ECS even in the service we develop. To fix an issue there we have to play telephone with an engineer in ESC where they give us a summary of the issue or debug it on their own. All data is really 100% staying within ESC.
My guess is that ESC will be less reliable than other regions, at least for about a year. The isolation really slows down debugging issues. Problems that would be fixed in a day or two can take a month. The engineers in ESC don't have the same level of knowledge about systems as the teams owning them. The teething issues will eventually resolve, but new features will be delayed within the region.
If youre a current AMZN employee you may want to delete or heavily edit this post. Go check your employers “social media policy.” Historically commenting on operational or internal aspects without PR approval was prohibited.
While it’s good to remain anonymous to avoid reprisals , once that’s done no one should care about upsetting their employer in an open forum. Despite what a corporation says they don’t own you, your thoughts or your voice.
Still it sounds like it would be the optimal choice for a redundancy zone in some senses since its probably not going to have any accidental dependency on us-east-1.
Have you tested this GCs performance? Sometimes a baby GC can be fast enough.
reply