ThousandEyes suggests app-layer issue to blame
UPDATED 16:39 GMT March 14, 2019. Facebook has blamed a “server configuration change” for the outage, without providing further details.
A Facebook outage that lasted nearly 13 hours left users of the social media company’s family of services locked out globally overnight, with Facebook itself, Instagram and WhatsApp users all affected to different degrees.
The Facebook outage has also affected its advertisement services and the company said it was considering the possibility of refunds for its advertisers. The company confirmed it was not the result of a DDoS attack but no further detail.
We’re aware that some people are currently having trouble accessing the Facebook family of apps. We’re working to resolve the issue as soon as possible.
— Facebook (@facebook) March 13, 2019
Networking monitoring company ThousandEyes said: “Our tests shows this is very likely an app-layer issue, as we see 500 Internal Server Errors being reported from various locations around the globe.”
It added that it was not seeing any border gateway protocol changes, suggesting the issue was one internally at Facebook.
[update] ThousandEyes looks at issues from the user’s vantage point. When investigating Facebook’s issues today, we’re not seeing any BGP changes that are affecting connectivity, packet loss or latency. pic.twitter.com/hwRQPg7k1X
— ThousandEyes (@thousandeyes) March 13, 2019
An update on Facebook’s developer page said: “We are currently experiencing issues that may cause some API requests to take longer or fail unexpectedly. We are investigating the issue and working on a resolution.”
The status page showed a quadrupling of API error rates to still less-than-catastrophic 4.7 percent as of approximately 21:00 GMT and a doubling of average API response times to approximately 240 milliseconds.
The technical hiccup follows last night’s Gmail and Google Drive outage; few companies, even those with a sprawling array of data centres, seem to be immune to such issues: Microsoft Office 365 also recently suffered a near-two-day outage for some users.
We're also aware that people are experiencing issues with access to our ads interfaces, we'll share an update as soon as possible. https://t.co/Wk4STxGuEq
— Rob Leathern (@robleathern) March 13, 2019
Facebook operates 15 data centres globally, including in Ireland, Singapore and Sweden. The company had 1.58 billion daily active users in December 2018 and estimates that around 2.7 billion people now use its family of services.
One expert who did not want to be named told Computer Business Review they suspected it was an issue with an automation system that manages Facebook’s infrastructure. Facebook has built and open sourced a wide range of software including four key data infrastructure management tools, RocksDB, GraphQL, Presto and Haxl. The claim is, at this point, entirely speculative.