Wednesday, December 22, 2021

The cost of cloud

Putting your IT infrastructure into the cloud seems to be the "in" thing. It's been around for a while, of course. And, like most things related to IT, there are tradeoffs to be made.

My rough estimate is that the unit cost of provisioning a service on AWS is about 3 times that of a competent IT organization providing a similar service in house. Other people have come to the same number, and it hasn't really changed much over the last decade. (If you don't think 3x is right, consider what AWS' gross margin is.)

Some services offered by AWS deviate from that simple 3x formula. The two obvious ones are network costs, which as Cloudflare have argued are many times higher than you would expect, and S3, which you're going to struggle to beat. (Although if you're going to use S3 as a distribution site then the network costs will get you, think about Wasabi for that.)

And yet, many organizations move to the cloud to "save money". I'm going to ignore the capex versus opex part of that, and simply note that many IT organizations have in-house operations that are neither very efficient nor cost-effective. In particular, traditional legacy IT infrastructures are ridiculously overpriced. (If you're using commercial virtualization platforms and/or SAN storage, then you're overpaying by as much as a factor of 10, and getting an inferior product into the bargain - so that while many organizations could save a huge amount of money by moving to the cloud, they could save even more by running their internal operations better.)

Often the cost saving associated with a migration - not just cloud, this applies to other transitions too - comes about not because the new solution is cheaper, but because a migration gives a business leverage to introduce better practices. Practices that, if used for your on-premise deployments, would save far more than the cloud ever could. Sometimes, you need to do an end run round an entrenched legacy IT empire.

Another consideration is that the cloud has often been touted as something where you pay for what you use, which isn't always quite correct. For many services, you pay for what you configure. And some services are nowhere near as elastic as you might wish.

Capacity planning doesn't go away either, it's actually more important to get the sizing right, and while you can easily buy more capacity, you have to ensure you have the financial capacity to pay the bills.

Note that I'm not saying you should always run your systems on-premise, nor that it will always be cheaper.

Below a certain scale, doing it yourself isn't financially beneficial. There's a minimum configuration of infrastructure you need in order to get something that works, and many small organizations have needs below that. But generally, the smaller providers are likely to be a better option in that case than full-on cloud offerings.

Having the operational capability to support your infrastructure is also crucial. If you're going to support your own hardware, you really need a team, which is going to set a minimum scale at which operations are worthwhile.

This becomes even more true if you need to deploy globally. It's difficult to do that in-house with a small team, and you have to be pretty large to be able to staff multiple teams in different geographies. A huge advantage of using the cloud for this is that you can deploy into pretty much any location without driving your costs insane. Who wants to hire a full team in every country you operate in? And operationally, it's the same wherever you go, which makes things a lot easier.

In recent times, the Coronavirus pandemic has also had an impact. End user access to colocation facilities has been restricted - we've been able to do repairs, recently, but we've had to justify any datacenter visits as essential.

There are certain workloads that are well matched to the cloud, of course, Anything highly variable, with spikes above 3x the background, will be cheaper in the cloud where you can deploy capacity just for the spike than it would be in house where you either overprovision for peak load or accept that there's a spike you can't handle.

The cloud is also great for experimentation. You can try any number of memory and CPU configurations to see what works well. Much easier than trying to guess and buying equipment that isn't optimal. (This sort of sizing exercise is far less relevant if you have decent virtualization like zones.)

You can even spin up a range of entirely different systems. I do this when testing, just run each of a whole range of Linux distros for an hour or so each.

What the above cases say is that even if the unit cost of cloud resources is high, the cloud gives you more of an opportunity to optimize the number of units. And, when it comes to scaling, this means the ability to scale down is far more important than the ability to scale up.

I use AWS for a lot of things, but I strongly regard the cloud as just another tool, to be used as occasion demands, rather than because the high priests say you should.

3 comments:

Anonymous said...

If you want S3, take a look at Ceph storage. With croit.io it's possible to deploy s3 infrequent access one zone like service at the AWS glacier pricing including all costs. You can save so much by hosting stuff yourself!

Dario Galvis said...

mangadex is saving +95% of their hosting costs by using ceph instead of s3

Anonymous said...

"often been touted as something where you pay for what you use, which isn't always quite correct" - 100% agreed. We kept asking our engineers why they need such big instances, and they kept saying well, we will just pay for what we use 🤦🏽‍♂️ there's an open source project (https://github.com/infracost/infracost) that at least now shows how much this shit costs - but you have to be using terraform