Isn't it common practice to host your status board on someone else's infrastruct...

tuwtuwtuwtuw · on Nov 25, 2020

It's common practice for small players but Amazon, Microsoft Azure and Google Cloud host their status pages on their own servers because they value the marketing aspect higher than a functioning status page for their customers.

Frost1x · on Nov 25, 2020

I find it surprising how many people forget how much underlying business motives drive pretty much every action they make and how this is quickly forgotten by many.

No matter how much you value science and engineering, it ultimately doesn't matter to the business unless that aligns directly with their revenue stream. Sometimes it does, sometimes it doesn't.

tuwtuwtuwtuw · on Nov 25, 2020

Yes. But I wonder if self-hosting their status page is really the correct decision from a marketing perspective. The people who consumes the status page on say Google Cloud probably know that Google self-hosting it is a bad decision from a technical point of view. So to the only people who care, their choice appear stupid.

So I don't really understand what they gain by doing it. I think maybe I am wrong about it being a marketing concern and that the choice is more related to internal politics and incompetent management.

Frost1x · on Nov 26, 2020

The point is to manage potential external liabilities. A business doesn't want any sort of liability they have automatically costing them if they can avoid it. They're more than happy to have anything that profits them automatically generate revenue, but if something could potentially lose them thousands or millions, they want to make sure there's a human-in-the-loop from management to check off. Not meeting SLAs or service outages are a good way to cost them money.

Few companies really respect their engineering teams/divisions in any sensible form from my experience, though I'm biased (even in heavy R&D environments). You're simply a means to an ends.

I understand your point though (and identify with it), but I find any mechanism/option that provides a way of containing potentially damaging information is going to be pushed by management over the option to release damaging information that a responsible engineer may want to disclose.

You're in a culture where admitting fault or liability is like pulling teeth and ripping finger nails off. It shouldn't be IMHO (we should own up to our mistakes and be reasonably forgiven), but that's unfortunately not the culture we have.

GilbertErik · on Nov 25, 2020

If they host it somewhere else, it signals they lack confidence in their own product.

If they self-host it, it signals that they're overconfident in their ability to maintain an accurate status page.

Given these two options, which do you think a budget manager will have an easier time signing off on and defending upward?

tuwtuwtuwtuw · on Nov 25, 2020

Yes, that was why I was referring to internal politics and incompetent management.

srveale · on Nov 25, 2020

Reminds me of: "When a measure becomes a target, it ceases to be a good measure"

When you're advertising uptime/availability, you're motivated not to report downtime/unavailability. Then the value of such reports is lost; developers start banging their heads trying to figure out if it's a service outage or a bug in their software (yes, informed by personal experience).

brown9-2 · on Nov 25, 2020

The marketing aspect of what? No one is choosing a vendor based on where they store their status page

colinbartlett · on Nov 25, 2020

I operate StatusGator, which is a service that aggregates status pages so I'm ALL TOO familiar with the AWS status page.

The main change they made in 2017 was the ability to post a message at the top of the page that is independent of the status of the individual items below. IIRC, it was the items they couldn't update. So that is kind of a hack, but it works.

It would be ideal if it was host entirely on completely separate infrastructure, and even a separate domain, but I won't hold my breath. Theirs is still more reliable than, for example, the IBM Cloud status page which was hard down during their epic outage back in June.

WrtCdEvrydy · on Nov 25, 2020

S3 East didn't affect the ability but they couldn't swap out the green checkmark for the red checkmark... which is just hilarious.

zucked · on Nov 25, 2020

That day was a nightmare for a lot of people - it wasn't just S3 that went down, it was like all of US-EAST.

Luckily my company decided against multi-az for the cost savings so I spent all day firefighting.

actionowl · on Nov 25, 2020

Multi-AZ doesn't help when a whole region is down, unless you're referring to multi-region AZs (e.g us-east-1a and us-west-1a)

rhizome · on Nov 25, 2020

I have to think they're talking about the latter.

eternalban · on Nov 25, 2020

So what’s the cost breakdown? Did they make the right decision?

dexterdog · on Nov 25, 2020

For one day of his time and probably a small part of a day of diminished service, most likely.

jmartens · on Nov 27, 2020

In a world where we can do virtually anything we want with technology, why do we rely on vendors updating their own individual status pages?