But if you are doing indexing, why do you necessarily care that it's specifically chronological?
Or besides that, if there are odds of CreateAt collision, and you are fallbacking to ID, you are still possibly not getting it chronologically?
And also if CreateAt does happen to equal to another record that is exactly the case where Postgres might most likely not have the auto incr chronological.
So still it seems like the edge case it tries to prevent it would still happen at least at similar magnitude of odds.
> But if you are doing indexing, why do you necessarily care that it's specifically chronological?
edit: on second review if live insertions were occurring then this code would have an edge case, however the indexing job has an endtime presumably chosen where they can be sure no more inserts will occur. Given that the choice to use a timestamp probably has to do with the fact that there are 4 different tables being indexed and you would otherwise have to track their IDs individually.
ID only: If we use ID > 4 as our start point for next time and ID 3 was not inserted yet then we will have missed it
createdAt only: If we use createdAt > 1002 then we will skip ID 5 next batch
OPs strategy: Even if we use createdAt > 1002 and it skips ID 5 it will be caught by createdAt = 1002 AND ID > 4. The Order by createdAt asc, id asc guarantees that if the limit splits 4 and 5 that we see 4 first and thus don't miss 5. I think this does still miss the case where ID 4 is inserted after ID 5 however.
> I think this does still miss the case where ID 4 is inserted after ID 5 however.
Yeah, and I would think that is very likely to happen given it would be the same timestamp.
So the whole thing still seems like a flawed, and unnecessarily complex solution to me, which should just use one simple unique sortable field to do all of it.
Like my intuitive guess is that maybe the solution could save maximally 20% - 40% of the same edge case, which doesn't seem like a good solution. It is not going to solve the problem. It's just adding complexity that can cause other problems.
So if postgres does the type of caching where it allocates 10 auto incr IDs to each process, which causes sometimes IDs being out of order, then normally it would be just enough to wait after these allocations have performed and index then, you are not going to miss any rows.
I would assume these processes have some form of timeout if there was a case where they couldn't assign one of those IDs, and then this ID would just maybe not exist or if there was a mechanism to reallocate that would work too, but none the less, I think some form of postgres sortable unique id would have to work by itself.
Or besides that, if there are odds of CreateAt collision, and you are fallbacking to ID, you are still possibly not getting it chronologically?
And also if CreateAt does happen to equal to another record that is exactly the case where Postgres might most likely not have the auto incr chronological.
So still it seems like the edge case it tries to prevent it would still happen at least at similar magnitude of odds.