“Request traffic volume… exceeded thresholds”
Microsoft Azure continues to struggle scaling up its infrastructure to meet the demands of booming workloads, a fresh outage suggests.
Users across the Asia Pacific region late Sunday were frustrated in efforts to access a range of services for two hours, 40 minutes, with Microsoft forced to manually scale out backend infrastructure to ease the bottleneck.
“A subset of customers using Azure Active Directory may have experienced authentication issues when accessing resources”, MSFT admitted.
“We determined that issues with request traffic volume and regional contention exceeded thresholds and caused AAD [Azure Active Directory] Token requests to timeout or fail. We have manually scaled out backend infrastructure and redistributed traffic to mitigate this issue.”
Frustrated users took to Reddit late-Sunday (the incident took Between 23:00 UTC on 14 Jun 2020 and 01:40 UTC, i.e. started midnight UK time) to vent their frustration, as Microsoft’s status page continued to show all was well and some users failed to gain access to login.microsoftonline.com.
Microsoft did not name the data centre that was the culprit; users in Australia and New Zealand appear to have been affected.
Azure APAC Outage
The APAC Azure Outage came as Microsoft in mid-March — at the peak of the rush to WFH — told users it was throttling a range of services amid intense pressure on its infrastructure from surging usage.
The cloud hyperscaler reduced content migration, Data Loss Prevention (DLP), and backup solution bandwidth during weekday hours, shrunk download limits on OneNote and reduced video resolution on SharePoint.
Early in April it also admitted that Microsoft Azure users on free trials, student accounts, and offers based on monthly credits had been blocked from spinning up cloud services owing to a capacity crunch.
Microsoft has faced having to meet a huge boom in demand for remote working tools just as server supply chains froze.
As the company admitted in an earnings call in late April, it had faced “supply chain issues coming into the quarter” as cloud providers scrambled to secure enough servers to ramp up data centre architectures.
CFO Amy Hood added: “While we spent $3.9 billion in Q3, that was certainly short, in particular, on the server side in terms of getting what we need into the data centers. Things got a lot better in March, and they’re continuing to get better. And so I feel good that we’ll have a healthy CapEx number in Q4 [and] continue to get ahead of the surge in demand.”
The issues have not gone unnoticed by rivals, with AWS emitting barbs over Microsoft’s “spotty operational performance during the COVID-19 crisis” last month, amid a row over the $10 billion Pentagon cloud contract.