“The Operations Engineering Team was using this process during routine operations…”
When CenturyLink, one of the world’s biggest internet backbone providers, faced a mystery issue on August 31 at 10:04 GMT, it proceeded to take down 3.5% of the world’s traffic, triggering outages at some of the world’s most popular websites including the PlayStation Network and Xbox Live. The event wasn’t fully cleared until 15:10 GMT.
Now, thanks to a swift RFO (reason for outage) shared with customers and seen by Computer Business Review, the precise reason for the outage has become clear and it was not, as we had initially speculated, the result of a botched attempt to mitigate a Distributed Denial of Service attack. Rather, it was down to a failed attempt by the company to “block a single IP address” on behalf of a customer, it has admitted.
(CenturyLink had earlier acknowledged in a status update that the substantial outage was due to an “offending flowspec announcement” Flowspec is a protocol used to mitigate sudden spikes of traffic, for example the effects of a DDoS attack).
Flowspec: “Problematic Protocol”
Here’s how CenturyLink explained it to customers.
“As a large influx of traffic is identified from a set IP address, the Operations Engineering Team utilizes Flowspec announcements as one of many tools available to block the corrupt source from sending traffic to the CenturyLink network.
“The Operations Engineering Team was using this process during routine operations to block a single IP address on a customer’s behalf as part of our normal product offering. When the user attempted to block the address, a fault between the user interface and the network equipment caused the command to be received with wildcards instead of specific numbers. This caused the network to recognize the block as several IP addresses, instead of a single IP as intended. The user interface for command entry is designed to prohibit wildcard entries, blank entries, and only accept IP address entries.
The company continued: “A secondary filter that is designed to prevent multiple IP addresses from being blocked in this fashion failed to recognize the command as several IP addresses. The filter specifically looks for destination prefixes, but the presence of the wildcards caused the filter to interpret the command as a single IP address instead of many, thus allowing it to pass… the problematic protocol [then] propagated through many of the edge devices on the CenturyLink Network.”
The RFO paints a vivid picture of the brown stuff hitting the fan when the issue was noticed: “The IP Network Operations Center (NOC) was engaged, and due to the amount of alarms present, additional resources were immediately engaged including Tier III technical Support, Operations Engineering, as well as Service Assurance Leadership. Extensive evaluations were conducted to identify the source of the trouble.”
The Flowspec announcement platform has been disabled from service on the CenturyLink Network in its entirety, CenturyLink said. It will “remain offline until extensive testing is conducted” while the secondary feature is being “modified to prohibit wildcard entries.” Yes, the internet is held together with duct tape.