In the future I imagine this will be the case, and official versions will be indicated by a published MD5 hash, etc. As for why it hasn't happened already . . . why does the web still run on JavaScript, a language that was obsolete the day it was designed two decades ago? Even within the tech industry, tooling moves at a glacial pace.
We'll get court opinions stored in version control when we get a generation of judges that grew up with word processors with integrated version control. As it stands, version control isn't even well-implemented in mainstream word processors.
As it stands, version control isn't even well-implemented in mainstream word processors.
Maybe we should be teaching version control first when we introduce students to word processing. If they know they can always go back to an earlier version and compare changes, they might be more willing to explore and learn faster.
> official versions will be indicated by a published MD5 hash
That would be proof positive to me that they have no intention of leaving the documents unchanged. MD5 has been unusable as tamper-proof signature for a long time now.
Although MD5 has known collision issues, using an MD5 collision to alter the text of the document in a specific manner is highly unlikely. This is not as trivial as simply generating some random bits of data that result in the same signature when hashed.
In the use case where you want to change the document for your own purposes without leaving evidence that you've tampered with it, you would not be searching for just a single other file which generates the same hash value. You would be looking for another file which results in the same value when hashed, has the same format (text is easy, I suppose, but it's conceivable they could switch to a binary format at some point), makes grammatical and syntactical sense, and also contains the desired content that you are looking for (or some approximate.. the 'exact' desired content would result in a different hash in most scenarios).
That said, I agree that there are better hashing algorithms out there. Multiple signatures from several common hash functions could be provided for quick confirmation purposes while still making it relatively easy to validate quickly.
(Edit: Also, since you are concerned: I am not the downvoter)
I know it is not trivial but to start off with a known broken hash would be a big mistake in my opinion. There have been some pretty impressive demos regarding this.
A collision in under a minute would seem to me to qualify as 'broken' for all practical purposes of this particular has function for the purpose of proving a document was not modified after it was signed.
That that minute is still a substantial number of cycles is not too relevant given the importance of the documents that are being modified, it counts as a 'non-expense'.
You're not wrong, you're just putting the conversation on a pedantic tangent. I mentally replaced "MD5" with "a cryptographically secure hashing function" because even though I knew it was wrong hash to use, I know what the poster was trying to say.
We'll get court opinions stored in version control when we get a generation of judges that grew up with word processors with integrated version control. As it stands, version control isn't even well-implemented in mainstream word processors.