Isn't it common practice to host your status board on someone else's infrastructure?
In 2017 there was an S3 issue that supposedly affected their ability to post. I believe they said that they were updating how they posted to the status board so that there would no longer be a dependency on S3. Well, I guess whatever they're dependent on now broke.
It's common practice for small players but Amazon, Microsoft Azure and Google Cloud host their status pages on their own servers because they value the marketing aspect higher than a functioning status page for their customers.
I find it surprising how many people forget how much underlying business motives drive pretty much every action they make and how this is quickly forgotten by many.
No matter how much you value science and engineering, it ultimately doesn't matter to the business unless that aligns directly with their revenue stream. Sometimes it does, sometimes it doesn't.
Yes. But I wonder if self-hosting their status page is really the correct decision from a marketing perspective. The people who consumes the status page on say Google Cloud probably know that Google self-hosting it is a bad decision from a technical point of view. So to the only people who care, their choice appear stupid.
So I don't really understand what they gain by doing it. I think maybe I am wrong about it being a marketing concern and that the choice is more related to internal politics and incompetent management.
The point is to manage potential external liabilities. A business doesn't want any sort of liability they have automatically costing them if they can avoid it. They're more than happy to have anything that profits them automatically generate revenue, but if something could potentially lose them thousands or millions, they want to make sure there's a human-in-the-loop from management to check off. Not meeting SLAs or service outages are a good way to cost them money.
Few companies really respect their engineering teams/divisions in any sensible form from my experience, though I'm biased (even in heavy R&D environments). You're simply a means to an ends.
I understand your point though (and identify with it), but I find any mechanism/option that provides a way of containing potentially damaging information is going to be pushed by management over the option to release damaging information that a responsible engineer may want to disclose.
You're in a culture where admitting fault or liability is like pulling teeth and ripping finger nails off. It shouldn't be IMHO (we should own up to our mistakes and be reasonably forgiven), but that's unfortunately not the culture we have.
Reminds me of: "When a measure becomes a target, it ceases to be a good measure"
When you're advertising uptime/availability, you're motivated not to report downtime/unavailability. Then the value of such reports is lost; developers start banging their heads trying to figure out if it's a service outage or a bug in their software (yes, informed by personal experience).
I operate StatusGator, which is a service that aggregates status pages so I'm ALL TOO familiar with the AWS status page.
The main change they made in 2017 was the ability to post a message at the top of the page that is independent of the status of the individual items below. IIRC, it was the items they couldn't update. So that is kind of a hack, but it works.
It would be ideal if it was host entirely on completely separate infrastructure, and even a separate domain, but I won't hold my breath. Theirs is still more reliable than, for example, the IBM Cloud status page which was hard down during their epic outage back in June.
In 2017 there was an S3 issue that supposedly affected their ability to post. I believe they said that they were updating how they posted to the status board so that there would no longer be a dependency on S3. Well, I guess whatever they're dependent on now broke.