A report by cloud startup CAST AI claims that companies running cloud-native applications typically spend three times as much on resources as they need due to over-provisioning.
The startup, which hopes to convince users to consider its platform as a way to control the problem, said overspending sometimes occurs due to an overly cautious approach on the part of the customer, but is often due to the difficulty of knowing exactly how many resources are needed.
A report from CAST lifts the data generated when running infrastructure utilization reports for more than 400 organizations using the three major cloud providers, thanks to its AI-powered platform that compares resources that an application uses with those that it really needs.
He found that, on average, organizations spend three times more than necessary. The majority of this can be attributed to provisioned but unused processors and memory, often through users selecting VM instances that do not optimally match workload requirements.
According to CAST AI co-founder and product manager Laurent Gil, part of the problem is that users often hedge their bets and exercise caution in the face of unexpected spikes in demand when sourcing resources.
“The number one job of DevOps is that the application needs to be running 24/7 all the time, so what do they do, they look at the worst-case scenario over a week, and they tend to provision resources depending on that worst case,” he said. .
Gil cited the example of a customer using M5 instances on AWS for an application, where the CAST AI engine was able to recommend savings by another instance called C6a that costs less.
“The biggest difference between these two is that C6a is AMD, which is slightly cheaper on Amazon. And C means it’s compute-optimized, so it has less memory per core, and the app doesn’t have you don’t need all that memory,” Gil explained.
The key, however, is that this change resulted in lower costs without affecting application performance.
According to CAST AI, provisioning remains a significant challenge for organizations of all sizes, as major cloud providers have a bewildering choice of instance types, making it difficult and time-consuming to choose the right instance for your workload. work. , not forgetting to ensure that the infrastructure hosted in the cloud is continuously at the right size. The problem is that DevOps professionals should constantly monitor cloud resources to achieve this.
If you are a DevOps, your number one priority is to make sure the application works. If he doesn’t have enough resources, you’re in trouble, so I think it’s a tendency to oversupply… They’re not bad, are they?
The company hopes that customers will buy its platform to solve this problem, of course. The service is free to try so users can see how much they are over-provisioning resources, but if organizations sign up as a paying customer they can allow the platform to take control of their cloud resource provisioning. and automate correspondence. from workloads to machine instances that are optimal for the job.
CAST AI runs on all three major cloud platforms – AWS, Google Cloud, and Microsoft Azure – and focuses on cloud-native workloads running in containers and using Kubernetes. The company claims that customers can achieve savings of 50-75% on cloud resources to run their applications.
The startup cited a customer, a French e-commerce startup called La Fourche, which it said was using fifteen t3.2xlarge instances and two t3.xlarge instances on AWS for a cost of $4,349.95 at the time of the post. analysis. Its platform instead moved workloads to five c5a.2xlarge instances, resulting in a cost of $1,310.40, a savings of almost 70%.
According to Gil, further savings can be realized by using Spot Instances, which use spare resources available at a cost well below the standard On-Demand price for the same machine instance.
“Cloud providers need to overprovision resources because they need to be elastic. So some machines aren’t in use all the time, and what they don’t sell at any given time, they’re typically down 60-70%” , said Gil. .
The downside is that if AWS needs to reclaim the machine, they give you a two minute warning before reclaiming it, which is likely to deter many from using these deeply discounted instances in case they go missing in short order. term.
“For us, this is an advantage, because we can automatically detect that a Spot Instance is about to fail, and immediately replace it with another one. So there is never a failure in our case, because we replace the machine when we see it working, and if there are no more Spot Instances, we will select on demand,” Gil told us.
The CAST AI engine can identify which containers in a client are “spot-friendly”, according to Gil, so that if a container is running a DNS server, for example, that wouldn’t be allocated to a spot instance. Customers can enable or disable the option to use Spot Instances.
The co-founder said the statistics CAST AI has collected on over-provisioning do not differ significantly between the three major cloud providers, indicating that the situation is not due to any particular cloud that makes it difficult to manage. costs for customers.
“I think it’s because of human behavior. For example, if you’re a DevOps, your number one priority is to make sure the app works. If it doesn’t have enough resources, you’re in trouble. , so I think it’s an oversupply trend,” he said. “They’re not evil, are they? They don’t charge you more for their service; it’s just very difficult for humans to consume something when they have so many choices.”
Independent analyst Clive Longbottom told us that an automated monitoring and management system like CAST AI makes a lot of sense. However, he warned that customers rely on the platform to do everything right in real time.
“Overall, my take would be to go for it: do the financial math to make sure the costs saved are greater than the cost of subscribing to CAST AI, but make sure you have a plan B in place if CAST suddenly had to disappear. day.” ®