Turning big compute power into big results

Joseph Sibony
Joseph Sibony reading time: 6 minutes
March 7, 2023

If you’re developing a game on the cloud – no matter if your team is just moving there, or you’re bursting as needed, or you’re fully cloud-native – you’ve already been convinced about the wonders of the cloud.  

In fact, most game developers are already on the cloud – a 2022 survey by Perforce found that 78% of respondents are either fully on the cloud or at least working on a hybrid model – and yet we continue to hear about issues related to costs. The real question is, can studios both have their cake (having better cloud performance) and eat it too (in the form of lower costs)?  

The answer is – and especially if you’re using Incredibuild – yes. Working on the cloud today just makes sense. That doesn’t mean teams have to be all in or all out. You don’t have to get rid of those shiny server farms you already built. You can get more out of them, too. But the cloud is your safety net. And having tools that help you make that safety net faster and stronger is key to getting to the real benefits.  

Let’s explore how Incredibuild can optimize your cloud resources a little more closely, and then let’s look at the case of Epic Games, which has been working with Incredibuild for some time, to see how we helped accelerate their game development.  

With great power comes great responsibility  

Big apps mean more compute power. More power requires an investment – either physical or on the cloud – that a lot of companies feel they simply must absorb. There’s really no way around it, right?  

Take a team that operates mostly on the cloud. At peak usage times, several teams might need to run builds that are massive (think rendering an entire world map or working on shading for a AAA game). So, what does a team do when they need to build and iterate? Just fire up more instances – that’s the beauty of the cloud; just add more power as you need. But this strategy just kicks the can down the road. The bill always comes due. 

Sure, you can spin up all the instances in the world, get the processing power you need and move on. But you need someone chasing all these dev teams and powering down instances to avoid skyrocketing costs. Repeat this process a few (or a thousand) times, and you’re left with a massive bill or a massive headache.   

Okay, we hear you say. That’s true and all, but what can you do? There are deadlines to meet, games to launch, and time and customers wait for no dev. 

You can start by using cloud orchestration tools that reduce some of the strain on IT teams by automating large portions of the spinning up – and more importantly, spinning down – of resources. But you can go further.  

Instead of relying on trigger points to spin up resources (which causes its own issues), you could instead use dynamic resource allocation to shift exactly what you need to your cloud environment at peak usage times, complete your builds, and then power down.  

See spot run…sometimes 

One common solution to cloud bill shock is to use spot instances. We’ve talked before about how spot instances help keep your costs low, but it’s worth exploring it briefly. In theory, the concept of spot instances is great. You can use surplus instances that are just “laying around” at a deep discount, with the caveat that they may get taken away unexpectedly when they’re needed elsewhere. That’s a pretty big downside in most cases where you need to be running live services – that is, most apps, games, cloud platforms, etc.  

Murphy’s law is in full effect here: Assume that if it could get taken away, it most likely will – and usually when you need it the most. Even so, there’s ways to make cloud work on a massive scale, without the massive bill.  

The solution is to find a way around the issue of missing spot instances. In a sense, all you have to do is replace the one you lost with another one. So, what if you could do that at scale, with not just one spot instance but a whole fleet? Incredibuild, for instance, can orchestrate entire spot fleets that can replace ones that are taken away automatically. This way, you don’t lose work time, and you don’t overpay.  

Rethinking cloud optimization 

It makes sense, then, that you’d want a solution that gives you easier access to the benefits of cloud, with fewer downsides. How can you, for instance, enhance your Unreal Engine work on the cloud? Let’s break down a few ways you can optimize your cloud environment with Incredibuild.  

Leveling up the autoscaling  

Now we get to the question of value. Manually managing cloud instances requires a lot of manual input. Even so, it’s hard to think of dismantling infrastructure you’ve already created. Or adding more. But sometimes, you can add value simply by rethinking the resources you already have on the cloud. Take Epic Games – one of the biggest game publishers in the world and the makers of Fortnite – whose cloud usage is significant, to say the least.  

Epic Games uses Incredibuild to automate and track core utilization dynamically and optimally allocate or de-allocate the best mix of spot and on-demand instances based on workload demands and core availability (you can learn more about how Epic Games optimizes their cloud infrastructure in this great talk from Epic’s Manager, Unreal Engine Infrastructure, Alex Carbury). 

When you’re dealing with cloud bills, zombie machines can be a major headache, and reducing the need for personnel to manage it can ease up pain points across your organization.  

The sum of all parts  

One of the biggest issues is the ability to allocate and distribute compute power exactly where it was needed. For example, instead of a single 128 core machine that would be more expensive, you could use 16 cheaper eight-core machines that could deliver the same results.  

In the same way that having 40 PCs at your office is more cost-effective than a massive server room that requires constant maintenance, the ability to use less-expensive cloud instances means that you can better allocate resources and have more flexibility when it comes to who needs them.  

See spot run…always  

Spot is still a great way to lower costs, but making it work seamlessly remains a challenge. For a live-service game like Fortnite, having instances removed means potential down time or unexpected glitches. With Incredibuild’s spot fleet orchestration, Epic was able to automatically spin up new spot instances as soon as one was removed and ensure cache transfers that kept seamless continuity.   

Getting incremental with it  

Even on the cloud, with (theoretically) endless resources, starting builds from scratch for every change, new version, or hotfix, means lots of time spent waiting. Instead of a fresh build every time, using a tool like Build Cache means you can build incrementally – adding more time for testing, iteration, and cutting down time spent going to the donut shop and office-chair jousting.  

Build Cache reuses build outputs from across your dev team, meaning that even if you’re switching branches, if someone already ran that build, you’ll be able to cut down your work significantly. This is also true no matter where you’re working from, whether at home, on the road, or at your office.  

Turning the cloud into an asset  

In the end, there are ways to work on the cloud, and ways to make it work for you. It’s no longer a yes or no proposition – in today’s hybrid working world, with distributed teams, constant need for updates, and everything-as-a-service, it’s hard NOT to be on the cloud at least partially. The real question is how you can optimize cloud ecosystems to give your teams the power you need at the right time, and for the right cost.

Want to see how you can get faster without getting more expensive? Learn more now! 

Joseph Sibony
Joseph Sibony reading time: 6 minutes minutes March 7, 2023
March 7, 2023

Table of Contents

Related Posts

6 minutes These 4 advantages of caching are a game-changer for development projects

Read More  

6 minutes What is Platform Engineering?

Read More  

6 minutes Everything You Need to Know About Virtualization

Read More