Because you need to understand that your processing code is constrained by the fact that you can't get exactly-once delivery. You must write your processing code to handle it one way or another. There's some libraries that try to wrap the abstraction of processing exactly once around the code, but those libraries still impose constraints on the sort of code you can write. They can make it easier but they can't fully remove the need to think about how your processing code works. It isn't exactly the same.
This is why people like me insist it's important to understand that you can not have exactly-once delivery. There is no library that can make that just go away, they can only shuffle around exactly where the lumps under the carpet live, and if one programs with the mistaken idea that these libraries really do solve "exactly once" delivery, one will get into deep trouble. Possibly the "my architecture is fundamentally broken and can't be rescued" sort of trouble.
> Because you need to understand that your processing code is constrained by the fact that you can't get exactly-once delivery.
Why do I need to understand that? Why can I not put an abstraction layer that provides me with the illusion of exactly-once delivery?
> There is no library that can make that just go away
Well, this is the thing that I dispute. I believe that there is a library I can write to make it go away if I have at-least-once delivery. In fact, I claim that writing such a library is an elementary exercise. The TCP protocol is an existence proof. Where is the flaw in my reasoning?
OK, but you have to do a little extrapolating here because the claim is not that you can do exactly-once under all circumstances. That is obviously false because you can't do exactly-once in a situation where all comms are down indefinitely. My claim is that if I have at-least-once then I can build exactly-once out of that.
This isn't about indefinite communication loss. Obviously no progress is possible in that case. The two generals' problem has nothing to do with a permanent failure.
I think there is a lot of literature out there if you're really interested in understanding and I'm happy to provide more links if you'd like.
> The sender cannot know if a message was delivered since transport is unreliable; thus, one or more acknowledgement messages are required. Moreover, the sender cannot distinguish between message delivery errors, acknowledgement delivery errors, or delays (either in processing or because or network unreliability).
> The recipient is forced to send the acknowledgement only after the message is either processed (or persisted for processing) because an acknowledgement before processing would not work: if the recipient exhibits a fault before processing that would cause the loss of the message.
> In the case that that acknowledgement is lost, the sender can’t know (due to lack of insight in the recipient’s state) whether the recipient failed before scheduling the message for processing (in essence losing the message) or if the recipient is just running a bit slow, or if the acknowledgement message was lost. Now, if the sender decides to re-deliver, then the recipient may end up receiving the message twice if the acknowledgement was dropped (for example). On the other hand, if the sender decides to not re-deliver, then the recipient may end up not processing the message at all if the issue was that the message wasn’t scheduled for processing.
"Let’s assume that a protocol exists which guarantees that a recipient receives a message from the sender once and only once. Such a protocol could then solve the two generals problem! Representing the time of the attack as the message, the first general (the sender) would only need to adhere to the protocol for the second general (recipient) to have received the attack time exactly one time. However, since we know that this is not possible, we also know that exactly once is not possible."
The 2GP is not just the first general knowing that the second general received the message. The 2GP is the problem of achieving common knowledge between the two generals, i.e. it's not just that G1 needs to know that G2 got the message, it is that G2 needs to know that G1 knows that G2 got the message, and G1 needs to know that G2 knows that G1 knows that G2 got the message, and so on.
Exactly-once delivery is possible. The only thing that is not possible is for the sender to know when the message has been received so that no duplicates are sent. But exactly-once delivery is not only possible, it's trivial. All you need to do is discard duplicates at the receiver.
> All you need to do is discard duplicates at the receiver.
...yes, which would be exactly once _processing_ but not exactly once _delivery_.
Unless you're wanting to redefine "exactly once delivery" to mean "at least once delivery but I'm calling it exactly once because I have a strategy to cope with duplicate messages"
Delivery is communication between two actors. Processing is what one actor (the receiver) does with a message.
Communication is when two actors exchange a message. Communication is generally done over an unreliable medium because in practice there is no way to communicate without the potential of failure.
1. Exactly once communication between two actors over an unreliable medium is impossible. At the very least you have to account for the possibility of failure of the medium, so you might need to re-send messages.
2. At least once communication between two actors is possible -- just re-send a message until the receive acknowledges the message.
3. Because a message might be re-sent, the receiver must be able to cope with duplicate messages. This is what you're describing. This might be done by making message processing idempotent or tracking which messages you've seen before. In either case, you have achieve exactly once processing. That is, if a receiver is given the same message multiple times, only the first receive of the messages changes the state of the receiver.
---
> But exactly-once delivery is not only possible, it's trivial.
Considering that many in the field consider this problem to be impossible (or, at best, extremely difficult, e.g. https://www.confluent.io/blog/exactly-once-semantics-are-pos...), this should be a huge red flag to yourself that you're missing something. Everyone has blind spots and that's okay, but hopefully you understand that there's a pretty big mismatch here.
Alternatively, it's possible that this problem _really_ is trivial and you have some unique insight which means there's a great opportunity for you to write a paper or blog post.
> hopefully you understand that there's a pretty big mismatch here
Yep. But on the other hand, 1) I have a Ph.D. in CS, and 2) I have yet to see anyone in this thread actually produce a reference to a reliable source to back up the assertion that exactly-one delivery is impossible. Indeed, the one reference you provided has a headline "Exactly-Once Semantics Are Possible" so you are actually supporting my position here.
Finally, I will point out something that should be obvious but apparently isn't: "exchanging a message" between computers over a network is a metaphor. There is not anything that is actually exchanged, no material transferred from one computer to another. There is only information sent in the form of electrical signals which results in state changes in the receiving system, so there is no clean boundary between "communication" and "what the receiver does with a message". Receiving a message in the context of a computer network is necessarily "doing something" with that message. There is no other way to "receive" a message.
TCP implementations are an abstraction that work 99.99% of the time, but are still vulnerable to two generals when you look close. TCP is implemented in the kernel with a buffer, the kernel responds with ACKs before an application reads the data.
There is no guarantee that the application reads from that buffer (e.g. the process could crash), so the client on the other end believes that the application has received the message even though it hasn't.
The kernel is handling at-least-once delivery with the network boundary and turning it into at-most-once with the process boundary.
What makes you think the 2GP is relevant here? The 2GP has to do with coordination and consensus, not exactly-once delivery.
> TCP is implemented in the kernel with a buffer, the kernel responds with ACKs before an application reads the data.
True. Why do you think that matters?
> There is no guarantee that the application reads from that buffer
So? What does that have to do with exactly-once delivery? Even if the application does read the data, there's no guarantee that it does anything with it afterwards.
> The kernel is handling at-least-once delivery with the network boundary and turning it into at-most-once with the process boundary.
OK, if that's how you're defining your terms, I agree that you cannot have exactly-once delivery. But that makes it a vacuous observation. You can always get at-most-once delivery by disconnecting the network entirely. That provides a 100% guarantee that no message will be delivered more than once (because no message will ever be delivered at all). But that doesn't seem very useful.
> Why can I not put an abstraction layer that provides me with the illusion of exactly-once delivery?
You can do that. You can implement a video conferencing system ontop of TCP, and it will even work, technically. It will just have terrible performance characteristics that you'll never be able to fix. You might even call it fundamentally broken.
Read the two general's problem. It is proven that exactly once _delivery_ is a physical and mathematical impossibility.
Like you said, you can simulate it with exactly once _processing_. But to do that, you have to know to do that.
Others are saying "you can't divide by zero" and you are saying "yeah, but if I detect a zero and the do something different, then it is the same thing." No, knowing you have to do something is the very point of acknowledging you can't divide by zero.
Because you can't have exactly once delivery, you have to deal with it. One trick is duplicate checks or idempotent writes. This gives exactly once processing. This also takes additional overhead and why audio and video stream processing doesn't typically do the additional checks.
I have fixed many bugs written by people who believe the network is reliable. I even hired one when during the interview and we talked about this kind of issue that they realized why they were getting duplicated writes reading from sqs or sns at $existing_job. They were green but smart and she became a real asset to the team.
What makes you thin the 2GP is relevant here? 2GP has to do with coordination and consensus, not exactly-once-delivery.
> you can simulate it with exactly once _processing_
What exactly do you think is the difference between "simulating" exactly-once delivery and actually having exactly-once delivery? What do you think "delivery" means in the context of a computer network?
https://bravenewgeek.com/you-cannot-have-exactly-once-delive...