Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Invariants: A Better Debugger? (brooker.co.za)
84 points by r4um on Sept 2, 2023 | hide | past | favorite | 48 comments


I find the general point interesting, but want to nitpick:

> "items in the deleted state must have a deleted time"

I much prefer making it impossible to represent bad state, like rephrasing at as "items are deleted if their deleted_time property is set". In this case, that makes the deletion atomic: there's never an instant where it's ambiguous if the object was deleted.

That doesn't disagree with his premise. It's an additional recommendation.


sometimes it's not easily achievable, when code is owned by other team or data comes from external source. I guess forbidding invalid state is the best option, and checking invariants is viable choice when you can't avoid invalid states


Reminds me of the old blog post with the phrase “Make illegal states unrepresentable” https://blog.janestreet.com/effective-ml-revisited/ (Yaron Minsky)


Not all databases optimize NULL columns, so searching for items that aren't deleted might be a table scan worst case. I speak from hard-earned experience...

In such cases I fall back to computed columns. A deleted flag which value depends on deleted_time being set or not, for example. This retains your atomic state goal.


Invariants are also an important part of Design by Contract [0], and more generally as class invariants [1] in OOP.

[0] https://en.wikipedia.org/wiki/Design_by_contract

[1] https://en.wikipedia.org/wiki/Class_invariant


It would be great if C++, Rust, Zig or similar languages could have language features to support invariant assertions. The library solutions are not good enough IMO.


I could see C++ or Rust maybe adding that eventually, but it seems unlikely for Zig unless they relax some of their core design goals. The problem with invariant assertions is figuring out when exactly they should run, and that idea that something might run _every_ time any public method is called or whatever definitely conflicts with the design they're going for.

That said, Zig does make it pretty easy to handle a lot of the features you might want. As a library, you can offer the ability for a person to rewrite a type (maybe only in non-optimized modes if you want) and check the invariants before/after every public function call. You can always add explicit pre/post-condition checks to any function you'd like. Invariants in your inputs/outputs can be hoisted to the type checker (I wrote a toy project doing this [0]). Like...at some point you have to describe what you want to check and when you want it to run for the invariant checking to be useful, and Zig allows you to write the checks and use comptime magic to place them where you want them to run, all in the userspace of the language. Which language features would you like to make the process smoother (automatic subtype resolution on a lattice of constraints, a bit of syntax, ...)?

[0] https://github.com/hmusgrave/pdv


> Hoare logic (also known as Floyd–Hoare logic or Hoare rules) is a formal system with a set of logical rules for reasoning rigorously about the correctness of computer programs. It was proposed in 1969 by the British computer scientist and logician Tony Hoare, and subsequently refined by Hoare and other researchers.[1] The original ideas were seeded by the work of Robert W. Floyd, who had published a similar system[2] for flowcharts.

https://en.wikipedia.org/wiki/Hoare_logic


I don't think the two are comparable at all. Invariants are like tests or assertions, debugging deals with issues that are not handled well by either. You debug more when you invariant or test less, but one isn't a replacement for the other. You can't step through code or examine arbitrary registers with an invariant.


Invariants are for sanity. Debuggers are for when the invariants are found to have been violated.


Yeah, that's how I see it.

I try to code small, assertion-like "invariants" into my programs via guard clauses and... assertions. When I get an error, it's usually handled by those, so I pop open the debugger, break on the exception and trace the stack back to see where I should have had a higher-level assertion or guard clause.

Invariants to make debugging fast and easy (and to force me to write better code in early, malleable stages), debugger to show me exactly where I wasn't thinking it through.


Interesting. That makes sense but in my experience if an assertion is triggered because something went wrong, I tend to add more assertions 'higher up' the call chain to try to catch the cause of the assertion being triggered. Basically, to fail earlier.


I use assertions heavily in my own code and they work extremely well.

I don't buy the distinction between assertions and invariants as made in the blog post though, as if one was a higher level variety of the other. You can assert anything (though it may get expensive!). I consider invariants to be the logical statement of what should be true, and an assertion to be a program statement that actually performs the check.

Terminology, terminology, it doesn't matter. I agree with the author, it's an incredibly valuable tool.


He later in the post shows invariants which cannot be verified as assertions within the system (e.g. as a process within a distributed application), because they rely on having an omniscient view of the system (e.g. access the state of all processes in the distributed system at one point in time). I think that's where the distinction makes sense. Of course you'll likely still turn those invariants into assertions, but in some kind of simulator which can view the complete state of the system rather than in the system itself.


Ah, yes, thanks


no, its doesn't. but to me an invariant is something that might be useful in a static context (that is something a compiler might enforce, or use to generate more efficient code), and an assertion is a runtime construct.


As an ABAP developer looking to get back into the normal world I'm a little scared about how debugging will be like.

After playing around with a bit of C and trying gdb, I asked myself if this is really what people use. The display constantly breaks and there's not a lot of stuff to quickly walk through code and visualize what it's doing.

If there's something I don't understand within 15 seconds in ABAP, I fire up the debugger.

ABAP is a ridiculously complicated language that's been getting more and more features tacked on for 30 years and there are now about 1000 instructions in it. I firmly believe that there are some instructions in there that not a single person on earth understands, so it needs a good debugger to make it usable.

I jump into the debugger before even trying to look at the code because it shows me everything in the usual editor component and I can navigate through it as fast as I can read, so it's basically assisted reading.

I couldn't imagine doing anything like that with GDB. Am I just too stupid to go fast and quickly get information in GDB? Is there a better alternative that I completely missed. Is non-ABAP code just more abstract to a point where debugging in general doesn't make much sense anymore?

Tying that back to the article: I agree that making all important assumptions explicit with assertions helps a lot with not shooting yourself in the foot. I can usually test the result and use my trusty debugger to find problems without assertions as well, but that requires remembering all my assumptions and knowing exactly what I want my code to do. If I write the assumptions down, I don't have to remember them and the next dev will be able to see what I thought. I also don't need to check if they still hold true since the runtime does that for me and hands a short dump to anyone who violates them.


> trying gdb, I asked myself if this is really what people use. The display constantly breaks and there's not a lot of stuff to quickly walk through code and visualize what it's doing

gdb has a tui that can show you multiple panes of information at a time, e.g. the source code, disassembly, and registers.

Other tools like Emacs, Eclipse, and CLion can use gdb as a backend and provide a friendlier interface on top.


and it has a python interface so you can improve some stuff if needs be


“You should use a scripting language to make it into something tolerable” is the most damning review of a program you can give.

It’s basically saying: “It’s great software, once you finish writing it yourself.”


You can see things that way, but it could also have the wrong UI layer and people wouldn't like it, here you have freedom.

But I'm an emacs user so obviously that fits my psychology :)



>After playing around with a bit of C and trying gdb, I asked myself if this is really what people use. The display constantly breaks and there's not a lot of stuff to quickly walk through code and visualize what it's doing. >I couldn't imagine doing anything like that with GDB. Am I just too stupid to go fast and quickly get information in GDB? Is there a better alternative that I completely missed. Is non-ABAP code just more abstract to a point where debugging in general doesn't make much sense anymore?

As someone doing programming on unix since 2001, hell no, that's not what I use (directly), for the reasons you say.

I mean gdb in the backend is okay. And gdb actually specifies Gdb/MI, a protocol to remote-control it.

You can also use gdb directly locally this way, and then it sucks less.

But nowadays I just use the IntelliJ IDEA IDE. I wrote https://github.com/daym/idea-native2-debugger which is a native debugger plugin (that uses gdb in the back) for IntelliJ IDEA Community Edition. You can also use CLion (requires you to subscribe--but it's totally awesome and has a lot more features) and that uses lldb in the back instead of gdb.

Some of my friends use emacs with gdb debug thing. Seems to work OK too.

Earlier in life I used https://www.gnu.org/software/ddd/ a lot--but nowadays I just stay in IDEA.

After you got used to IDEA, using anything else is like riding a kick-scooter after you got used to a motorcycle.

emacs is also good as an IDE--but for me it just doesn't work well enough--and I'm not interested in having to maintain my editor configuration like an extra software project.

You can also extend gdb with python plugins that make your variable printout nicer and/or more complete. It's very good.


Seer is a decent front end for gdb

https://github.com/epasveer/seer


No, gdb is not what people really use. There is probably a correlation between using it and having a grey beard.

Here's an example of modern tools:

https://twitter.com/EskilSteenberg/status/169533726869157102...

https://youtu.be/CxKujAuz2Vw


You have to use gdb unless you are using windows unless there is another debugger for linux that I'm not aware of.

Watched the video and you pretty much have everything mentioned in it with just an lsp, compiler flags and a command that runs your program with a debugger. The only example that wasn't caught with just that was the p[3] example but that was caught by valgrind which is also something I have a command setup for.

Really the only thing I was actually jealous of was at the very end of the video and it wasn't the hot reloading stuff, that's easy if you split your programs from the platform its running on. The ONLY thing was that the vs debugger lets you explore variables with a visual interface instead of having to write commands to do it. I've used it on windows and even though its usually busted and variables do not always update they sometimes do. At the very least just viewing the hiearchy of the structure is useful so that you can easily add the value you want to the watch window which usually does work.


> You have to use gdb unless you are using windows unless there is another debugger for linux that I'm not aware of.

lldb does work on Linux.


Yet to watch video, but does visual studio support macros in immediate or watch windows?

That so far has been my frustration when not remote debugging linux vm, where it becomes a gdb front-end


Yes, you can do basic stuff directly in VS watch did windows, or for more advanced stuff you can use NatVis - https://learn.microsoft.com/en-us/visualstudio/debugger/crea...


It sounds like you'd be much more comfortable in an IDE with a debugger. Any reason you're not using visual studio or clion etc? Even vscode should be able to give you a decent debugging experience if you're not doing anything too complicated.


In your experience, what properties makes a good invariant, and how does one find them in practice for a new system?


Anything that must be true. Taking a sqrt of a float f? Assert f is positive.

Can't read from a closed file? Assert the file is open.

The classic, transferring cash from account a to account b? Assert that a+b before = a+b after.

Object should be in a given state after a certain method is invoked? Assert it.

It is possible to assert some invariants at compile time using strong typing, that's pretty useful too and I'm learning to do that myself (nothing clever, it's just thinking of 2 different classes to represent 2 different states, or use phantom types). More complex compile-time assertions are possible using dependent types, but that's out of my pay grade at the moment.

Edit: assertions can come before and/or after the thing is done. Technically if you're checking before then it's a pre-condition you want to ensure, if it's after, then it's a post-condition check you're doing. Let's take the square root example, and assume that you're actually implementing a sqrt function, not just calling a library function. The pre-condition is as before, that f must be positive, and the post-condition is that for the result r, r*r = f (if you've successfully taken the square root of a thing, squaring it should get you back to the original value, or at least very, very close). HTH


I have to say, the cases that you raise (e.g. square root function) seem quite a different nature than the systems that the author appears to consider (e.g. distributed garbage collector). Your considerations appear to enumerate some types of checks one would use in unit testing.

As for taking as invariants anything that must be true, in a system very many things are true, and the most part of them would be utterly uninteresting. Understanding the process that one would use to pick insightful ones is rather the art which I'm intrigued by.


Yes. I was trying to give you an easy introduction to the idea.

And which are you more likely to be writing, a distributed garbage collector or a single-core library/application? Realistically, very much the latter. And if the former, much of that system will be made up of stuff running strictly on single cores, to which plain bog-standard assertions will be just as useful.

> one would use to pick insightful ones is rather the art which I'm intrigued by.

Well, it's not really an art, it's more straightforward than that (edit: think of them as basic sanity checks, not anything deep or profound), but if my overview of the idea isn't helpful you need to provide a more concrete example to discuss. I think your question as it stands is too ill-defined.


Your examples are more postconditions than invariants. Invariants are usually assertions for a given state object that always hold for that object throughout its lifetime. For example, a mutable list implementation based on a resizable array might have the invariant that the value of the length field is always less than or equal the current array size. Or for a loop invariant, that a certain predicate on the state mutated in the loop is true for all iterations of the loop.


A postcondition is an invariant[1]. Invariants must hold at the end of the operation however the intervening inconsistent state must be invisible (effectively, be atomic). They often are (maybe always are?) violated within the operation but must be patched up by the end.

[1] Edit: no, a postcondition expresses an invariant in code - but that's a terminology thing. We may be disagreeing just on terminology here.


precondition is something which must be true before some code executes

postcondition is something which must be true after some code executes

invariant is both: something which must be true both before and after some code executes.

So it's the other way around. An invariant is a postcondition, but a postcondition is not necessarily an invariant.

(you can check all three with assertions, but they will generally exist whether you check them or not)


A reasonable distinction. I accept it.


I think of unit tests and the debugger going together like peanut butter and jelly. That is, a language like Java lacks the interactive REPl of, say, Python but you can get similar interactivity writing tests in an IDE like IntelliJ Idea and breakpointing, and when you are done you wrote some tests.


Java has a REPL, jshell, which was released with JDK 9 in 2017.


yeah, and there are REPL for C but hardly anyone uses them, Back in the day I would have used groovy or clojure if I wanted a REPL in Java.


I need to make better use of invariants.

Part of that is the brain damage from C where asserts disappear in release builds. This relegates them to second-class, despite the fact that it is trivial to write the equivalent with an if statement and an early return.

Swift’s guard keyword is really nice. We could really use Eiffel-style invariants in our PLs now, IMO.


Disabling some of them in release builds makes sense. Like, say you have a binary search routine. It's totally fucked if the inputs aren't sorted, so you add that as an invariant. Now your log(n) performance sank to linear because you have to check the invariant. Invariants you can't express with your type system should have the option to have their checks disabled in release builds.


And use invariants to provide some formal guarantees.

See e.g. https://dafny.org


Fortunately assert is hardly part of the language itself, you can provide your own macro for it and you probably should.

Among other things you probably DO want to get rid of them when doing coverage analysis-- unless you like seeing your code base that truly has 100% branch coverage report as only having 30% coverage due to all the invariant checks that can't be triggered because the code isn't buggy.


I don't think asserts are really the best representation for constraints, but disabling them for release builds is entirely a question of your policy, and nothing fundamental at all - leave them on if you like. The real issue here with your early return rather than a panic, is what the semantics are for the caller.


Basic invariants are very helpful for developing data structures and algorithms if you can come up with good ones. The more the better, as long as they can be checked reasonably quickly.

For example, I was implementing a dynamic spatial data structure a while ago, so one of the first things I did is implement a validation function that checks the entire structure for consistency errors. Then as I implemented new features, I added calls to the validation after every change, and it caught way more bugs and edge cases than I expected. Easy to accidentally swap a <= for a <, and have it complete successfully, but produce subtly incorrect results that you'd never spot by hand.


>"Invariants, like assertions, are things that must be true during or after the execution of a piece of code. "This array is sorted", "the first item is the smallest", "items in the deleted state must have a deleted time", that kind of thing. Invariants are broader than assertions: they can assert properties of a piece of data, properties of an entire data structure, or even properties of collections of data structures spread across multiple machines."

Yes!

In theory it would be possible to assert the correctness of an entire codebase with only a single unit test(!) ran on only a single piece of data(!) -- but that's IF, if and only if -- the code is known to be tightly and invariably bound to that key piece of data such that its entire functionality can be asserted via the state of that piece of data!

There is a broader pattern here, too...

Let's suppose we have a cup which is half-full, half-empty.

If we call the fullness of the cup "X" and the emptiness of the cup "Y" -- then if we know one of these things, one of these aspects of the underlying system -- then we also know the other one! (X's relationship to Y is: X = 1/Y, correspondingly, Y's relationship to X is: Y=1/X (inverse AKA reciprocal relationship...))

But the point is: If you know the state of one, you also know the state of the other...

In the case of the theoretical codebase whose code is tightly, invariably, deterministically coupled to one key variable thus requiring only one unit test of that variable to assert the correctness of the entire codebase -- "X" is that variable, and "Y" is "the codebase"!

These relationships do not need to be inverse or reciprocal as with the half full, half empty cup of water example -- they can be asymmetric, as in the case of the theoretical codebase with the single unit test!

The broader broader pattern is "if thing A (correctly and invariably!) asserts thing B -- then if you have thing A -- then you also have thing B (without needing to check thing B independently!)".

In Logic, this can be thought of as "If A implies B, and you have A (A is True), then you also have B (B exists and is True -- as a result of merely having A)"

In Business, one might compare the concept of KPI's (Key Performance Indicators) to this -- if you have a KPI (measurement, observation) on the one hand, then you also have an asserted state of the business on the other, but remember, that's if and only if the two things are tightly and invariably bound -- that is, aspects of the same underlying phenomena...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: