Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Studying the relationship between exception handling and post-release defects (neverworkintheory.org)
75 points by zdw on Sept 17, 2021 | hide | past | favorite | 73 comments


At my first job I wrote a new Java application to ingest/transform/store some data and after about 6 months it was done. It chugged along for years after without ever really requiring any maintenance or monitoring, it just kept working. One day the VP eng just showed up at my desk and started chatting, I didn't really know why. Turns out he was interested in knowing why it was so stable but it was an obvious part of software to me - every line I wrote where an exception was possible, I would look up what exceptions could be thrown and why, and then structure try catches to do the right thing in each case - sometimes that meant logging a warning, sometimes propagating to a higher level, sometimes shutting down the app etc. He then sent out some paper about the relationship between good error handling and software defects to the org.

Like I don't get why this is so difficult for so many devs. Your job is to think about both the happy case and the error cases, and usually there are many error cases and 1 happy case so the bulk of time and code is dedicated to the errors. Since then I've seen this is not at all the way most devs write things. They are interested in writing the happy case and treat error handling like a chore to be done as quickly as possible. The tools and language constructs don't matter much, it's just do you dedicate time and care to thinking through errors or not.

Yeah it sounds boastful but fuck it it's true. Same is true regardless of this study it's not about the structure and syntax of error handling,it's how much thought was given to understanding the failure modes and how to handle them.


>One day the VP eng just showed up at my desk and started chatting, I didn't really know why. Turns out he was interested in knowing why it was so stable

Kudos to him, he paid attention, and noticed something that was stable. Usually something has to be broken to get attention at that level. Can't imagine leadership at most places thinking "hmm, this project we did several years ago had been running without bugs, let me go bring best practices from this developer."


Similarly, I've written systems that ran for years without needing much maintenance. I believe good error handling (and negative testing as well as positive testing) are the primary reasons for that. Unfortunately, it takes me longer to write code than my peers as a result. I'm not a professional developer anymore...


ouch i feel that. I do have the same problem all the time. if you don't mind me asking, what do you do now?


I moved into information security. I still get to apply my dev knowledge. I don't write code in my job, but I do secure code reviews.

I write open source code in my own time, and to be honest, enjoy it far more that way.


I'm refactoring a (not so big) TypeScript codebase from exceptions to returning error results for this reason.

I don't know about Java, but TypeScript does not have a way to annotate that a function throws, not to mention what it throws - so carefully handling every error case would be difficult. With the new approach I know exactly what kind of error codes each function can return and how to handle them.

Thanks for your comment, now I have more reassurance that I'm not wasting time with the refactor.


I think the hard part of handling all error cases like that is not the actual handling of the error cases, but making your code well structured enough that at any given place, you can actually enumerate all possible errors in a reasonably small list and so that you actually have a reasonable way of handling each error.

In poorly structured code, all possible errors could often include just about anything and for many of the errors you will have no reasonable way of handling them.


You assume that the only errors a program can encounter are logic errors. Logic errors are really the easiest class of error to fix. Here are examples of other errors that can bring down stable systems:

1) An API returns a list of items. In a new version of the API, the data structure used to generate the response list is changed from a list to a set, invalidating any implicit assumptions about response ordering. This happened to a service I worked on where there was an implicit assumption by callers that the results were ordered by date. This was true prior to using a set.

2) Resource leaks exposed by failure conditions. A network outage might cause infrequently tested code paths to leaks resources. In Go, I’ve seen this happen with network requests not tied to a context. This is an interesting case because the fact that an error occurs causes the leak even though the success path works correctly.

3) Missing back offs. A dependency may return errors in the case where a dependency becomes overloaded. This can cause a backup in something like an ingestion job. Without adequate back offs, the dependency may never be able to recover.

There are plenty more examples where simply handling every error case is not sufficient for stability.

As a question to you: why is this system more stable than other systems you write? Can’t you apply your error handling philosophy to everything?


1. Implicitly assuming something not specifically guaranteed is a logic error. Its like assuming hashcode(x) = x just because it happens to be true for small numbers in python. Its not part of the spec, and just observing it to be true a few times doesn’t change anything.

You could a. document the implicit assumption that the list is sorted or b. sort the list before continuing further.

2. A resource not being freed in all cases is also a logic error.

3. In the context of basically calling any service/API repeatedly, not using backoffs and setting a hard limit on the number of tries is also a mistake. I dont know if its a “logic error” but its certainly an error.

I don’t know why you think these are not error cases. Your program fails, and can be written in a way that does not fail or fails slightly more gracefully - thats an error case. There is some judgement involved - i might not implement backoff in a context where i know its not yet needed, i might just log a particularly obscure exception instead of trying to recover if i can’t imagine why it would happen. I’m not saying your program should be perfectly bug free from the moment its written - i’ve written plenty of buggy code too and we're all human. There are even a few errors that are truly insane like literally hitting a compiler bug that silently corrupts your program. Just that calling these things not error cases (and what, random acts of misfortune instead?) is a weirdly defeatist attitude that impedes further progress. They're all error cases, the only question is how much time and knowledge do you have to dedicate to handling them.


I called them errors in my post. But unintentionally using undefined behavior isn’t really a logic error. It’s a misunderstanding, and not something that you can figure out by looking at the API and thinking about the errors the API can throw.


Those errors can also be considered and mitigated though, if one thinks about what could go wrong instead of only thinking about what exceptions can be thrown:

1. One must either encode the assumption into a precondition or transfer the incoming data into a sorted data structure. But the gist is to always validate assumptions.

2. RAII’s pretty good at handling resources. Then one can inject failures to test the error handling paths and combine with code coverage measurements.

3. That sounds like an issue in the wider system which may be handled in the subsystem under development assuming that it can throttle back its requests, drop them, etc. But it may just as well be handled in another part of the larger system. It belongs more to the architecture realm, but it’s absolutely possible to foresee such issues.


For 1, the bug stems from unintentionally using undefined behavior due to a misunderstanding. You can never get the answer by "thinking about what could go wrong" because your view of the system makes the failure case impossible.


I'm not exactly sure why that is either, but I do think most developers are appealed by the idea that a computer program is a closed system and thus any error that happens is a fluke, unexpected, and should be treated as catastrophic, when that is simply not true. Programs take input from the parent reality, asynchronously, and therefore aren't just subject to cosmic rays flipping bits in the hardware. Personally, I like to handle the possible errors intelligently, but many developers, especially web developers, love to just have any error result in an error page. Even worse, they sprinkle `try { ... } catch (err) { // do nothing }` all over the place, and that results in problems that are hard to track down or detect early.


Interesting article and great blog post too.

I was lucky to start my career with a mentor who considered well written exception handling core to software development.

And he was always happy to stop his work (maintaining a core middleware in a telco with millions of transactions every minute) to review this intern's poorly written try/catch, show how he would write it, and why.

To this day I am thankful and believe his concerns over exception handling, good and simple deployment techniques, knowing networking to OS/kernel level, helped me becoming a better developer (or at least knowing where I need to improve).


To be clear, did he consider exceptions something to be avoided when possible or did he consider them to be a natural element of programming? I know exceptions do cause slow-downs on systems and most programming does its best to minimize occurrences within reason.


The downvotes are unfair for an extremely good question.

(I used to ask questions along this line when interviewing candidates.)

Exceptions are used to communicate an unexpected error. Specifically, an error that the caller doesn't expect to handle during normal operations of the program. These errors can range from unusual situations like a network failure, to even more perverse situations like true bugs in the program.

The example I discussed with job candidates was implementing a database access function called GetUserById. (In C#, pre compiler enforced null checking.) I would ask if the function should return null or throw an exception when there was no user with that ID.

What followed, (with the candidates who passed,) was a discussion about the trade-offs of returning null versus throwing an exception. Null allows the caller to know that there was no user with that ID without the overhead of the exception. But, returning null increases the risk of a NullReferenceException. This is risky, because it's harder to debug then a strongly typed exception with a useful error message. Thus, the "right" approach depended on if it was anticipated that someone calling GetUserById expected the user ID to always be for a valid user.

When there was time, we'd even get into the TryGet pattern that the .Net dictionaries use.

(By the way, now it's a good time to check out how Rust's enum type is used with error handling. It's really slick with no overhead.)


I would like to firmly push for a new definition of what an exception is for. It's for _aborting a sub-task in your program, if it cannot complete it's assigned goal_. Unlike the meaningless "it's for exceptional situations", this definition has the benefit of describing what they are for, and when you should use them.

Are you really not expecting an error, even as you are writing code to detect and report on it? No! You think it is unlikely to happen, but you are spending time preparing for it.

But after some errors, whatever sub-task the program was working on just isn't going to happen, and in that case the program needs to get back to a state where it can continue with the next sub-task. Exceptions do precisely that, letting you gracefully back out of the sub-task in a clean and clear manner.

Exceptions are not for aborting your entire program, as some people mistakenly hold (there's abort() for that). The fact that they are comparatively slow doesn't matter. Once you are not going to achieve your goal, it doesn't matter too much whether you'll do so at a rate of a thousand per second or a million per second, except perhaps for total system throughput.


I would like to firmly push for a new definition of what an exception is for.

I have some sympathy with your argument, particularly the idea that “for exceptional situations” is an empty tautology.

However, I find it a little strange to characterise exceptions as being “for” any specific purpose. This seems a common theme in the programming community, yet we don’t feel the need to characterise variables or for-loops or function calls as being “for” something in that way. They are just tools that our programming language provides, which have certain behaviour if we use them. That behaviour is (hopefully) defined objectively by the language specification, but how we then employ each tool is an open-ended and subjective question, a matter of judgement or perhaps convention.

In the case of exceptions, as provided in most mainstream programming languages, it is objectively true that they immediately exit lower level code and transfer control back up the call stack until they are handled at some higher level (or not). There are at least two reasons we might want to do that: something can’t do its job or something has now done its job. Either way, the outcome for that part of our program is now known and we are ready to proceed accordingly.

The proposal in the parent comment, aborting a sub-task if it can’t complete its assigned goal, is in the former camp. This might be the most widespread interpretation of what throwing/raising an exception represents. There is still plenty of debate about whether this should be used for “expected” failures like failing to find a file or connect to a network and/or for “unexpected” failures that imply some logic error in the program itself, but there is a degree of consensus that an exception represents some form of error condition.

But then you have languages like Python, which also uses built-in exceptions like StopIteration to indicate routine, successful completion conditions. This might be anathema to the school of thought that says exceptions are “only for indicating exceptional failures”, yet here it is, a different style that is used every day in one of the most popular programming languages ever created, and the sun still came up this morning.

Possibly the most unusual idea for using exceptions that I have encountered personally was to indicate a positive outcome from a complicated search algorithm. There were many mutually recursive functions that collectively scanned a graph-like data structure. On identifying a match, an exception would be raised to report the details. Using an exception in this way was like a multi-level early return statement or using a labelled break to exit multiple loops at once. It guaranteed a clean and immediate exit from the search, regardless of how it had recursed to reach that point, and the exception was then neatly handled at the same level of the code that started the recursive search, thus avoiding cluttering one or more paths through every recursive function in that search code with if(done) conditions. To some programmers, this might be a controversial use of the tool, but perhaps the question we should be asking is why, if the code was clearly correct according to the language rules and exceptions provided a neat, easily understood way to solve the design problem.


There are many ways to write correct programs within the bounds of any programming language. In languages that support goto, you don't even need functions and loops.

However, like any human discipline, there are good patterns for how to write a maintainable program, and there are bad patterns. There may be particular cases where a typically bad pattern is nevertheless the best available. But I believe GP is right on a good definition of the most understandable and maintanable reason for exceptions.


However, like any human discipline, there are good patterns for how to write a maintainable program, and there are bad patterns.

Sure. What I’m questioning is whether there is any rational, objective basis for arguing that using exceptions to exit early in positive cases is a bad pattern. The programming world is full of opinions, sometimes strongly held by experienced practitioners, that a certain style is bad. The programming world is also full of other experienced practitioners who use some of those controversial styles very successfully. Exceptions are a common source of controversy, but you could just as well look at type systems, significant white space, OOP, functional programming, or a hundred other areas where reasonable people can differ. I believe it’s important to distinguish arguments based on dogma or convention from arguments based on rational logic or empirical evidence. One type helps us to improve as individuals and as a community, while the other can only hold us back.


FWIW, I wouldn't argue it is necessarily a bad pattern, just a highly uncommon one; the sort of thing someone with a lot of experience can do, but that you wouldn't go teaching to beginners. I would also demand to see a comment explaining the unusual usage.

What I sort-of expected to be called out on is the definition of what a sub-task really is, because that is still quite vague. I'm inclined to match these to what a user of the program would consider a thing he does with the program at the largest scale level (things like "print a document" or "send a mail"), but there is certainly room for smaller granularity sub-tasks as well. I.e. if you are rendering a web page, but can't display an image for some reason, you'd just abort the image render, not the whole page render.

How would I classify the example given (loading a user record from a database)? Well, I don't know! It depends on the context: if we fail at loading that user, are we going to have to give up on whatever other things we were doing as well, or can we continue, possibly with some degraded functionality? If the first, it's an exception. If the second, null (or whatever passes for null in your language of choice, like std::optional in C++).

I suppose it wouldn't surprise you too much to learn that I do in fact have database access routines in both styles... There's database.load_one_record ("query"), which throws if it can't find that one thing you are looking for, but also database.load_one_record ("query", default_value), which returns the specified default value if no record matches the query. Because really, this one depends very heavily on context...


> I would ask if the function should return null or throw an exception when there was no user with that ID.

My answer would be “neither”. Both of these options are bad. Null is maybe less bad because it at least roughly approximates the algebraic type that accurately represents the image of the function. Ideally you would return some type like Optional<User>. I suspect this is what you’re referring to with the Rust reference.

A slightly more generalized version of this is to use the Either/Result monad, which can nicely handle multiple types of exceptional behavior at once.


Hrm, not the OP but, personally, it depends.

If you’re using a language without algebraic data types, you’d have to ask whether it’s perfectly normal for a record not to be found by ID at that point.

If it’s unusual, you should probably just throw the exception at that point because:

1. You check null and throw anyway, which gives you a stack trace just a little off the mark. That isn’t too bad but it’s more code for little gain.

2. There’s every chance you could forget to handle the null (It’s not the nil, it’s my discipline, I know, but it happens, we’ve all been there) and then you get a null pointer exception at some point later on in the flow, which is like the above but even worse.

If you’re using algebraic data types, you wouldn’t use an optional. There are a number of ways a DB look up can fail so Optional isn’t really ideal at all. Your suggestion of Either/Result is probably the ideal there.

Chances that you expect an explicit DB look up by ID to return nothing is pretty rare. Where has the program or client code got that ID from in the first place? Either you want to know about the error in your own code or you want the client code to know the error so you probably want some form of exception or Either.

In the rare case it is intended behaviour, yeah you could use null and push the responsibility for handling it to the calling code. Still, at that point, 2 kicks in - might be fine now but that assumption could easily change as the code evolves, which it can and often does.

You could argue you can implement something like Optional in, say, Java w/o algebraic data types but they are simply not ergonomic when the language does not help you use them.

But I don’t think you can hold up a single approach and say that’s the ideal. I doubt it was your intention but it’s rather dogmatic and doesn’t really play out nicely in real world code, especially legacy projects (not everything is written in Haskell|Rust).


> it’s rather dogmatic

I found it correct. I hold the same "anti null" and "anti exception" beliefs, and think Either/ResultObject/Maybe/Optional is ideal compared to them...

Your range of programmer thought seems to be limited by what is possible/idiomatic in popular languages.

> not everything is written in Haskell|Rust

The Truth is not measured in mass appeal.


The convention in C# is to make two versions of the method: GetFoo that throws, and TryGetFoo that returns a bool and places the result in an out parameter

For example, with the C# dictionary, you can either do: foo = dictionary["foo"]; Or: if(dictionary.TryGet("foo", out foo))

If Option<T> is your preferred pattern then you should use a language where that is supported.

Trying to recreate Option<T> in languages without compiler null checking has issues because either there's a risk that the Option<T> is null, or the value inside of it is null.


does C# support sum types? GP was talking about C# specifically.

Agree with your analysis though: it's essentially a partial function, so the range (codomain) of the function includes "no defined output for the specified input". It's therefore not a bug to return "not found". An algebraic datatype like Either/Optional/Result allows that to be encoded in the type system, and so provides explicit support syntactically and semantically for handling the situation.

I don't think C# supports sum types (though may well be wrong there, not so close to the language). Assuming not, the question is how to achieve the desired outcomes. Those might reasonably be one or more of:

* minimise the risk of null de-reference

* maintain performance

* be idiomatic

* be understandable

* be explicit

If (and it's a big if) the only language options are nulls and exceptions, there's not a clear winner there. So exploring the tradeoffs does seem like a reasonable discussion.


Effectively every OO language does. Inherit and override behavior. Add a piece of indirection that allows the result to specify the type of view, etc. Think of `UserSearchResult SarchUser()` as returning either `ARealUserResult { int userId; string display = "showUserView" or `UserNotFoundResult { string display = "notFoundView" }` (you could easily add MultipleUsersMatchResult or UserRequiresAdminPrivilegesResult ...)


Agreed. This is the most obvious way fwd. Rust, Haskell, Elm, even Kotlin seems to come along nicely.

Exceptions and null for error cases are horrible and Java and Go are doomed. (semi-jokingly)


I would disagree that network failure is unusual. It is something you should expect to have happen and should consider reporting such conditions a core feature of your API design as much as the happy path, not something bolted to the side.

Exceptions are best reserved for conditions that should be impossible, but become possible for an exceptional reason (bugs, cosmic rays flipping the wrong bits, etc.)


It's really critical that you don't have to litter every single function call and conditional in your code with "if network error." There's plenty of "one in a million" corner cases beyond network errors that an industrial strength program needs to handle.

Exceptions let you put a single error handler higher up in the stack for the one in a million errors, like network errors.


If a network error is a possible, there is no reason for it to not be a first class feature of your API. You can bolt it on to the side using goto by another name, but you shouldn’t.


I think he never considered the exceptions to the something to be avoided since he wrote Java most of his professional career - which ended when he decided to quit programming and go sell ornamental flowers in his family business in his hometown.

The middleware we worked had libraries from other companies (e.g. online prepaid transactions were handled by code from a jar/lib from Ericsson). But in the case the platform threw an exception, we could still charge the customer using a slower process (i.e. there's a window of time where the customer could use data/talk in excess until we realize there's no more credit when processing the slower transaction).

For him it was important to write these exceptions well. If something went wrong in production, we couldn't turn off the system, and if we were not charging users when we should to, the company would be losing millions (it was in Brazil, ~200 mi at the time, and the telco had ~60mi users I think, with mother's day and big brother being the craziest days with millions of messages per second.)


Throwing exceptions is slow. Adding exception handling code itself has little to no overhead.

Only throw exceptions in exceptional cases, not for frequently expected outcomes.


Java exceptions build stacktrace at construct time, which makes for the fun fact that instantiating an exception is the majority of its cost. At least the last time I checked. Actually throwing it is some fifth of the cost.

This is relevant in the log-and-throw-e situations.


Adding to that, assume any C++ code can throw exceptions unless it's specifically stated it won't.

If you're doing manual memory allocation in a function, your exception handling might be leaky. (gsl::finally is your friend.)


Imagine you're writing a function that parses an integer from a string. Would you let it throw an exception on failure?


Depends on whether you call the function `parse` or `tryParse`. :)

Throw if you expect that the input should never be invalid.

Don't throw, and do friendly logging and reporting of your parsing errors, if this is uncontrollable third party input.


> The longer the exception handling blocks in a file, the more likely the file is to contain bugs.

It is difficult to design a sophisticated error handling system. For various reasons. For example it is rarely talked about -- everybody focuses on how to get stuff working and errors aren't really hot topic people find interesting.

I think, given the above, you have most chance of success with really simple error handling systems. The simplest is to only handle things you can really handle at the current frame and let everything else filter to the top and interrupt entire process. It is not perfect but it has the virtue of being simple and easy to make foolproof.

> The Ignoring Interrupted Exception and Log and Throw patterns corelated with post-release defects in one of the projects they studied, but not all.

Interrupted exception does not happen in real usage in a lot of applications, especially backends, where the environment is very well controlled.

Log and throw can be a legitimate pattern. It makes sense when the exception travels outside some kind of boundary like module/library boundary. A REST API is an example of boundary and it is normal to log the error but still throw it (to the client). It also may possibly make sense to log additional information available in local scope and throw the error up the stack. It doesn't always make sense to convert the type of the exception to be able to add more information to it.


> Log and throw can be a legitimate pattern. It makes sense when the exception travels outside some kind of boundary like module/library boundary. A REST API is an example of boundary and it is normal to log the error but still throw it (to the client). It also may possibly make sense to log additional information available in local scope and throw the error up the stack. It doesn't always make sense to convert the type of the exception to be able to add more information to it.

Of course it's going to correlate with a bug. The whole point of log and throw is to make bugs easier to find!


> It also may possibly make sense to log additional information available in local scope and throw the error up the stack.

Like, time. This is why I often do "log and throw" or "log unexpected value and return it", depending on whether the code is exception-oriented or uses sum types instead. It's because time is an important piece of information to encode in an error log. When your error is caused by an external resource (e.g. RPC failure), you want to log it immediately to later be able to correlate it with external resource's logs or some other monitoring. If you don't the error may take some time to reach its final logging place - or, it may never reach it at all, if the application crashes in the meantime.


>The simplest is to only handle things you can really handle at the current frame and let everything else filter to the top and interrupt entire process.

I find this approach creates exactly what you try to avoid: error handling complexity.

The problem is with the Exceptions I believe. They mess with normal program flow. They create code that's hard to reason about, that's less explicit.

Sometimes Exceptions can clean up some code, but usually it just sweeps dirt under the carpet to blow up in your face later.

I rather use sum types and actual return statements (or implicit returns when all code is expressions) than Exceptions.


The statistician in me is severely troubled by a conclusion like "statistical relationship" when the conclusion is drawn from studying just 3 (three) projects. At worst case this means studying programming habits of just 3 (three) developers (appended: actually, 1 dev is the worst case). You just can't make any significant conclusions on such basis that generalize on whole population.

Even if it was huge projects with hundreds of devs, you still can't make generalizable conclusions, because habits of all this devs are clearly not independent.


Personally I really dislike exceptions and try/catch in languages. I don't like having to worry about whether some function call is going to surprise me with an exception, and handling them with try/catch really breaks the flow of the program.

I'm sure there's probably a lot I can learn to make this better for myself because I don't work with exception heavy code very often, but I find working with simple error value returns like in Go or error types in Haskell/Rust to be so much more ergonomic and comfortable to work with.


The benefit of exceptions and try/catch is that it lets you separate your exception handling logic from the mainline logic of the function. Having error logic weaved in and out of mainline logic just obscures what is going on and increases cognitive load. I much prefer writing and reading code with exceptions than explicit error handling control flow.


> it lets you separate your exception handling logic from the mainline logic of the function

So do the error monads (on Rust or Haskell). In fact, they offer a lot more flexibility on how to separate them, and can put even more distance between the happy path and error handling (if you need it, often people use the extra flexibility to place them closer).


But now, instead of worrying about errors, you're worrying about pleasing the type checker all the time.


You mean, handling all the corner cases and making sure your code is correct?

There are moments when the experience of programing in Haskell feels like "pleasing the type system". Those are few and far in between, and most times they are still because of a bad error message that converted your bug into a complex type error.

If your experience on the language gets those all the time, I suggest you get some more experience, because you are clearly still unable to design your types very well.


> If your experience on the language gets those all the time, I suggest you get some more experience, because you are clearly still unable to design your types very well.

It's possible to advocate for Haskell without casting aspersions on someone's abilities.


Sorry, it gets tiresome to hear the "tried Haskell once, types are a huge problem" claim again and again.

The language does have a steep learning curve, and it's no personal flaw to not get it even for a long while. But to go on and complain that the language is badly done because they couldn't get it on the first few tries is a deeply annoying display of hubris.


I can sympathize.


> I much prefer writing and reading code with exceptions than explicit error handling control flow.

Can you let us know what languages that use "explicit error handling control flow" you have used?

I've extensive experience with both an much prefer the "explicit error handling control flow" in Rust/Haskell/Elm/Kotlin/ReScript, than the exceptions in Java/C++/C#/JS/Ruby/Python.

Interesting that the use of implicit nulls (another of my annoyances in langs) is also split along these lines!


I've extensive experience with both an much prefer the "explicit error handling control flow" in Rust/Haskell/Elm/Kotlin/ReScript

How explicit are those languages really, though?

Take Haskell, for example. It’s common to indicate potential failures by having a function return Either/Maybe. Those are monadic, with their join behaviour propagating the Left/Nothing result that conventionally represents the failure case(s). It’s idiomatic to write a function that calls a series of potentially failing functions and not examine the result of each call immediately. Instead, you defer any handling of failures to the end of the chain or even pass the result back to the calling function via another Either/Maybe.

What you’re not doing in any of those alternatives is explicitly checking the return value after each function call and handling any failure immediately right in the middle of your default execution path. So is this really much more explicit than exceptions? The possible failure modes for each function are encoded in its type, but there are implementations of exceptions that also have that property, so that’s not really a point in favour of either side. The code calling the functions is still highlighting the default execution path and shifting the recovery from any other cases elsewhere.

As another example, Rust follows an analogous convention with functions returning Option/Result to indicate potential failure modes, but has try! and then added the ? operator to propagate errors with minimal extra code instead of writing all the boilerplate manually each time. This is arguably more explicit when calling those functions, since at least any function that can fail needs to be called with ? (or something more obvious) to handle the result, but fundamentally the convention is still trying to minimise clutter in the default execution path and delegate error handling responsibilities to code somewhere else, and it still doesn’t indicate anything more explicit about what the potential failure modes for each function are at the place where that function is called.


> Take Haskell, for example

Yes!

> Instead, you defer any handling of failures to the end of the chain

That's a choice. You can also do it immediately.

Also, no warped control flow: just regular "call-return" flows.

> What you’re not doing in any of those alternatives is explicitly checking the return value after each function call and handling any failure immediately right in the middle of your default execution path.

Like said, that's a choice. You can setup monadic handling, you can do each by itself. It depends on the situation what you use: but the DEFAULT is each by itself.

> As another example, Rust

Still the control flow remains intact. While Exceptions mess with control flow.


The separation between the mainline logic and the exception handling logic does not require an exception mechanism like in Java or C++.

A restricted form of GOTO, like in the language Mesa (from Xerox) is good enough.

Mesa also had exceptions for things where exceptions are appropriate, e.g. numeric overflow, out-of-bounds access or memory allocation errors, i.e. errors that are normally caused by program bugs and which can be solved only by someone who knows the internals of the program.

For errors whose cause can be determined and removed by the user, e.g. files that cannot be found or mistyped user input, the appropriate place of handling the error is the place where the function that failed had been invoked.

Neither inside the function that failed nor several levels above the place where the error has been detected it is possible to provide really informative error messages that can enable corrective action, because only at the place of invocation the precise reason is known why the failed function had been called.

The restricted GOTO from Mesa, whose purpose was error handling, could not jump backwards and it could not enter a block, which eliminated the possible abuses of the feature.

Moreover the labelled targets of the GOTO could exist only inside a delimited section at the end of a block.

The keyword GOTO is not needed, because it is redundant. At the place of the jump it is enough to write the target label, as no other action is possible with it.

So in a language like Mesa, the mainline logic would be something like (in languages with "then", no parentheses are needed, unlike in C and its followers):

if err_code := function_that_can_fail(...) then Error_Name

if err_code := function2_that_can_fail(...) then Error2_Name

and the error handlers will be grouped in a section similar with the error handlers used with the exception mechanism of Java or C++.

The difference is that the section with the error handlers must be in the same file with the function invocations that can return errors and the error handlers will be invoked only from there, not from random places inside who knows what 3rd party library might have been used somewhere.

Because for such handlers there is no COME-FROM problem, you know exactly what has happened and you can easily determine what must be done.


The challenge with exceptions is there is zero indication at the call site that a function can throw. Does myFunc() throw? Only way to know is to dig down through the entire call stack. Meanwhile with (value, error)/Result etc. it's obvious right at the call site whether a function can potentially error or not.


It's the exact opposite With the either functional approach you just move the error around, and when you can't deal with it anymore you don't even have the stacktrace to know what damn piece of code created the error in the first place


>The challenge with exceptions is there is zero indication at the call site that a function can throw.

Swift has that. Calling a potentially throwing function requires that you put "try" in front of it.

The downside is that you don't know what it throws (similar to Rust functions returning dyn Error)


Java has checked exceptions, but everyone seems to hate them.


I really don't understand the hate for checked exceptions either. The only downside to me is that you're forced to deal with exceptions when trying to prototype some code. A compiler option to ignore checked exceptions would easily fix that.


If you want to use exceptions then checked exceptions or the equivalent seems like a more logical approach than allowing anything to throw anything at any time. Which exceptions a function can raise is a part of its interface, so it might as well be documented and/or automatically recognised as such.

However, this style probably requires both a carefully considered set of possible exception types and good tool support to work well for developers in practice. Clearly we’d like to have both of those things in any case, but I’m not sure the average Java programmer at the peak of the checked exception debate had either, and it’s almost always in the context of Java that I see checked exceptions being criticised.

If you are dealing with functions that can end up propagating many different types of exception under sufficiently unusual conditions and you don’t have easy ways to do things like inferring exception specs for higher-level functions calling them or separating or consolidating multiple exception types as needed when handling them, that seems like a recipe for frustration. It’s easy to imagine writing endless boilerplate lists of obscure exception types for every function and then having to maintain those across your whole codebase any time some widely-used low-level function changes its spec, which seems completely impractical to manage at scale without good language and tool support.


They do cause issues when you need to implement an interface which doesn't declare the exception but you have a checked exception which can't be handled at that point. Only real option then is to rethrow as an unchecked exception. For example an Iterator which may have an IOexception.

I still like them, but they aren't a panacea, at least as Java implements them.


Yup. I would love to have checked exceptions in C++. They seem like the best possibility, at least for a language that can't support the monadic pattern thoroughly.


I love me some checked exceptions.


Interesting, thanks for sharing. I find there to be more cognitive load with the separate control flow. Different strokes I guess!


Why do people praise Go when it comes to error handling? It's the worst of both worlds since its std libs returns errors sure, but anything can panic as well, which is basically a poor's man exception system.

Rust on the other hand, like many other things got it mostly right. If you're going to use errors as a values then you need some constructs to deal with that at the language level.

Go's solution isn't more sophisticated than C error codes.


That's not exactly true. There's very few cases where the standard library will panic, and the handful of cases where it does is for programming errors (like passing a non-nil, zero length buffer into io.CopyBuffer).


Rust also has panics as a separate feature from errors-as-values, same as Go…


Error handling in Go is simple and verbose at the same time. Working in it alongside Python and JavaScript makes me rethink a lot of patterns I’m accustomed to - it’s a nice exercise. Though of course the use cases often vary significantly between the 3 languages so it is tough to even compare.


I think the biggest place where it's really necessary is integration points. Within your system it's reasonable to try to define exceptions out of existence. Once you start accepting user input or depending upon some system outside of your control, though, you better have some kind of mechanism to handle whatever unknown asteroids come flying in from deep space ready to annihilate your entire planet and civilization.


That is still provided by error values or error types. The defining feature of exceptions is that they provide a secondary control flow path, and that control flow path automatically flows up the stack until it reaches an explicit catch.

With error values/types, control flow happens normally, and the programmer is expected to explicitly branch on the the value using normal control flow constructs.


I see a lot of exceptions versus error types/values, as if there's no in-between of exceptions and error types/values. There exist languages that support both. OCaml has both optional values and exceptions. C++ with std::expected. C# with the work on nullability/nullable reference types, and already with nullable value types.


On the exception receiving side...

The key, IMO, is to assume that every line of code can throw until proven otherwise, and then make sure you clean-up after yourself, using RAII in C++, `using` in C#, savepoints in SQL or whatever else mechanism is available in your language.

After that, in 99% cases, you just let the exception propagate to the higher level. Eventually, it will result in an error dialog, or be logged or whatever, but the program is still in consistant state. It's a kind of a "soft reset" that doesn't destroy the state and lets you keep working.

---

On the exception producing side...

Don't throw a new exception unles you expect it can be reasonably treated as described above. If your immediate caller needs a `try...catch` wrapped directly around the call, this is probably a sign you should have returned an error result instead of throwing an exception.


  > there exist anti-patterns that can provide significant explanatory power to the probability 
  > of post-release defects. 
  > Therefore, development teams should consider allocating
  > more resources to improving their exception handling practices
While an interesting correlation. The conclusion that plugging this particular hole might not be the best goal to focus on improving.

It's two variables that are not statistically independent of each other. A novice developer is more likely to write bugs as they are to use anti-patterns of exception handling, or anti-patterns for anything else for that matter. Basically correlation is not causation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: