“It may have been that… the significant load that large number of BGP updates imposed on their routers made it difficult for them to login to their own interfaces”
September 2: Read our updated story with the RFO and more details here.
A major CenturyLink/Level 3 outage on Sunday that took down websites across the US and EU has been blamed by the carrier on an “offending flowspec announcement” — an encoding format typically used to block malicious DDoS attacks.
CenturyLink, one of the world’s largest network providers, bought Level 3 , a Tier 1 core transport, IP, and content delivery provider for $34 billion in 2017. Its outage took down swathes of websites, including Xbox Live early Sunday. Services were restored by 11:12 AM ET (16:12 BST), but the four-hour outage had a huge impact.
The sheer size of the network company meant the incident caused waves across the global internet. As content delivery network and security firm Cloudflare noted: “We saw a 3.5% drop in global traffic during the outage, nearly all of which was due to a nearly complete outage of CenturyLink’s ISP service across the United States.”
CenturyLink/Level(3)’s network was not honoring route withdrawals and continued to advertise routes to networks like Cloudflare’s even after they’d been withdrawn, it added in its own write-up on the incident, with a large knock-on effect.
“In the case of customers whose only connectivity to the Internet is via CenturyLink/Level(3), or if CenturyLink/Level(3) continued to announce bad routes after they’d been withdrawn, there was no way for us to reach their applications and they continued to see 522 errors until CenturyLink/Level(3) resolved their issue.”
Bad Flowspec Rule Blamed
A status update from CenturyLink blamed the outage on an “offending flowspec announcement [that] prevented Border Gateway Protocol (BGP) from establishing across multiple elements throughout the CenturyLink network.”
Flowspec, or the BGP flow specification, is a feature designed to “propagate filtering and policing functionality among a large number of BGP peer routers” as Cisco describes it. The network specialist emphasises that it is typically used to “to mitigate the effects of a distributed denial-of-service (DDoS) attack over your network”.
(BGP, in turn, is a protocol that manages how packets are routed across the Internet through the exchange of routing and reachability information).
centurylink, a baby bell: hey let’s buy an international carrier
also centurylink: help does anyone know bgp
— neal rice (@flakealso) August 30, 2020
Cloudflare notes in its own write-up that “one plausible scenario is that they [CenturyLink] issued a Flowspec command to try to block an attack or other abuse directed at their network… it may [then] have been that the Flowspec rule and the significant load that large number of BGP updates imposed on their routers made it difficult for them to login to their own interfaces. Several of the other tier-1 providers took action, it appears at CenturyLink/Level(3)’s request, to de-peer their networks”.