Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What the hell is a COE? I hate that nobody seems to bother defining their acronyms anymore.


(cause|correction) of error, though 'CoE' is the much better known identifier (like IBM vs International Business Machines).

They are a formal, in-depth retrospective on customer-impacting service degradations or outages. They include a thorough functional description of how the state of your service evolved into failure, a exhaustively recursive review of the operational decisions and assumptions that contributed to that failure, and a series of action items the team will take to ensure that the service will never fail again for the same reason.

Edit: This list is incomplete, and the link included in the sibling provides a better, more thorough description.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: