Cloud bills are a bit like puppies.
Watch them closely, and they’re fairly well behaved. But take your eye off them for one second and they’ll chew through your budget faster than a young Labrador chews through a pair of your favorite sneakers.
Unexpected cloud charges are one of the many hidden costs of software development, especially as the way we use the cloud evolves. And cloud bill shock — the moment of spine-tingling horror that occurs when you’re confronted with an unexpectedly huge charge from your cloud provider — has always been a problem for dev teams.
More worryingly, 64% of those surveyed by Anodot said that they don’t find incidents of cost spikes until days, weeks, or even months later.
The bottom line is, most teams need better ways to keep their dev processes running in the cloud, but still keep their cloud bill under control and avoid paying for capacity that they don’t really need.
Cloud spot instances can be an excellent way to improve your cloud optimization planning and ensure you only use the cloud capacity that you need at any given moment — if you use them correctly.
Let’s take a look at the benefits — then we’ll show you the trick we use to help our customers get around the biggest drawback of cloud spot instances.
What are cloud spot instances?
Spot instances are essentially packages of capacity that your cloud provider isn’t currently using.
All cloud providers keep some spare capacity available to make sure they can meet any surges in customer demand. But most of the time that spare capacity sits unused — which is why cloud providers like AWS, Microsoft Azure, and Google Cloud let you purchase that spare capacity on a short-term basis.
And here’s the kicker: You can get these “spot instances” for up to 90% less than on-demand instances.
Sounds like a dream, right? Well, there’s a bit of a catch.
Those spot instances are sold at a massive discount for a reason: They’re not yours forever. If cloud demand surges, your cloud provider will need those spot instances back — and only give you a few minutes of notice to clear your workloads.
If you don’t have a way to switch workloads to another instance quickly, you can experience data inconsistency, data loss, and interruption of active user sessions.
If any of those applications support customer-facing services, that could be catastrophic. There’s a reason that Facebook committed so publicly to zero downtime. Nowadays, customers are used to their apps, games, and live services being constantly available. They’ll quickly abandon any service that drops out too often — and, if you offer a paid service, you could lose valuable sales in the time that your app is offline.
When to use spot instances
Because cloud spot instance pricing is discounted so steeply, using them can have a huge positive impact on your cloud spend. But you shouldn’t rely on them for everything.
Generally, you should only be using spot instances for stateless applications that run short-term tasks — ones that can be interrupted if the cloud provider reclaims the capacity without sending all of your applications up in smoke.
That means things like:
- Web services
- Containerized applications and microservices
- Application testing
- CI and CD operations
- Data analysis
In short, anything that you can easily redeploy (via something like a GitHub Actions Runner) without causing a service outage or losing your team’s work.
Spot fleets: The smart way to bring down costs without sacrificing your service
Because cloud spot instances are more changeable and less reliable than some alternative approaches to cloud cost management, using them can still be pretty stressful.
It can feel like you’ve just transferred your stress and worry from one place to another; instead of watching your cloud bill like a hawk, you have to hover over your cloud instances and be ready to clear your workloads at any time.
That’s why the best cloud optimization services use a workaround to save cloud costs without putting you at risk of downtime: spot fleets.
A spot fleet is a collection of spot instances (and sometimes some on-demand instances) that are launched at the same time. Here’s how it works:
- You specify the capacity that you need, just as you would when you request a spot instance.
- Instead of launching one spot instance that can give you all the capacity you need, your spot fleet launches lots of smaller instances — as many as it takes to give you the capacity you need.
- If the worst happens and your cloud provider needs to take back some of their spot instances, the spot fleet automatically launches new instances to replace the ones your cloud developer takes back.
This means that you spread the risk across lots of smaller instances, instead of placing all of your bets on one spot instance. You’re never left out in the cold with no capacity.
As a result, you can use spot instances without risking downtime, or sending your stress spiraling.
And, most importantly, keep those cloud bills on a tight leash.