In other words, the consolidation on Cloudflare and AWS makes the web less stable. I agree.
Usually I am allergic to pithy, vaguely dogmatic summaries like this but you're right. We have traded "some sites are down some of the time" for "most sites are down some of the time". Sure the "some" is eliding an order of magnitude or two, but this framing remains directionally correct.
Does relying on larger players result in better overall uptime for smaller players? AWS is providing me better uptime than if I assembled something myself because I am less resourced and less talented than that massive team.
If so, is it a good or bad trade to have more overall uptime but when things go down it all goes down together?
From a societal view it is worse when everything is down at once. Leads to a less resilient society: It is not great if I can't buy essentials from one store because their payment system is down (this happened to one super market chain in Sweden due to a hacker attack some years ago, took weeks to fully fix everything, and then there was that whole Crowdstrike debacle globally more recently).
It is far worse if all of the competitors are down at once. To some extent you can and should have a little bit of stock at home (water, food, medicine, ways to stay warm, etc) but not everything is practical to do so with (gasoline for example, which could have knock on effects on delivery of other goods).
When only one thing goes down, it's easier to compensate with something else, even for people who are doing critical work but who can't fix IT problems themselves. It means there are ways the non-technical workforce can figure out to keep working, even if the organization doesn't have on-site IT.
Also, if you need to switchover to backup systems for everything at once, then either the backup has to be the same for everything and very easily implementable remotely - which to me seems unlikely for specialty systems, like hospital systems, or for the old tech that so many organizations still rely on (and remember the CrowdStrike BSODs that had to be fixed individually and in person and so took forever to fix?) - or you're gonna need a LOT of well-trained IT people, paid to be on standby constantly, if you want to fix the problems quickly, on account of they can't be everywhere at once.
If the problems are more spread out over time, then you don't need to have quite so many IT people constantly on standby. Saves a lot of $$$, I'd think.
And if problems are smaller and more spread out over time, then an organization can learn how to deal with them regularly, as opposed to potentially beginning to feel and behave as though the problem will never actually happen. And if they DO fuck up their preparedness/response, the consequences are likely less severe.
> AWS is providing me better uptime than if I assembled something myself because I am less resourced and less talented than that massive team.
Is it? I can’t say that my personal server has been (unplanned) down at any time in the past 10 years, and these global outages have just flown right past it.
Have your ISP never went down? Or did it went down in some night and you just never realized.
AWS and Cloudflare can recover from outages faster because they can bring dozens (hundreds?) of people to help, often the ones who wrote the software and designed the architecture. Outages at smaller companies I've worked for have often lasted multiple days, up to an exchange server outage that lasted 2 weeks.