Ballpark calculation shows extent of outage pain
A 63-minute Amazon outage on its “Prime” sales day this week cost the company nearly $100 million (£76 million) according to an estimate by retail discount aggregator Lovethesales.com, which was tracking Amazon prices and discounts on Prime Day.
In a quick calculation based on estimated Prime Day sales of $3.4 billion – as projected by Coresight Research – divided by Prime Day’s 36 hours to arrive at sales per minute of $1,574,074, the company arrived at a figure of $99,166,667 in estimated lost sales.
(The Amazon site was down for an estimated 63 minutes on Prime Day, according to Downdetector.co.uk)
The company is not alone in feeling the pain – even if $100 million is a mere twinge for a company that made $1.6 billion profit in Q1 of 2018 alone.
Unplanned application downtime costs the Fortune 1000 up to $2.5 billion every year, according to an IDC report.
Amazon Outage “A Warning to Retailers”
Jon Lucas, Director at Hyve Managed Hosting, said in an emailed statement: “Amazon is the giant of the online retail industry with AWS as the hosting infrastructure. AWS is the pioneer of autoscaling, so if this can happen to them, it should be a warning to other retailers.”
He added: “A key component of avoiding the same downfall as Amazon might be working with a hosting provider that offers a platform with the capacity for unexpected loads, without relying on autoscaling, which can sometimes take some time to kick in.”
Antony Edwards, CTO of performance monitoring and testing company, Eggplant, added: “Initially, Amazon’s issue felt very like someone had submitted some code that was causing instability between different systems. My guess is that it was between the core application and the content distribution network. I would have expected Amazon could back out the change and get stability back within 1-2 hours, but it looks like that didn’t happen. Either they didn’t know what caused the issue (so their configuration management isn’t that great), or the offending issue caused a downstream tangle that they struggled to get back under control.
He added: “I often go to technical meetings and hear ‘We want to be like Amazon, they release billions of times a day without any issues’. However, even Amazon is fallible when running at this speed. Release Analytics would probably have prevented this problem, alignment between production and testing is key.”