“It is clearly unacceptable to experience a power outage of this gravity”
Updated 8.10 BST August 19, 2020: final circuits appear to have been restored at approximate 22:20 late on August 18; some 18 hours after the outage began.
Data centre giant Equinix says it has restored racks for all customers after suffering a lengthy power outage at its IBX LD8 data centre in London’s docklands — the issue knocked services offline for hundreds of customers including several ISPs starting 4:30am BST; leaving many deeply frustrated at the slow restoration of power.
“Equinix engineers have diagnosed the root cause of the issue as a faulty UPS (uninterrupted [sic] power supply) system and we are working with our customers to minimise the impact. We regret any inconvenience this has caused” the company said in its first public statement at 12:04 BST on August 18. (Head of ISP Giganet Matthew Skipsey earlier described the lack of communication from Equinix as “abysmal”.)
Various customers identified the issue as a failing output static switch in the Galaxy UPS system (sold by Schneider). This connects the critical load to either conditioned power from the UPS or raw mains from the bypass supply. The length of the outage suggests that LD8’s A+B AC feeds are from the same UPS. ISPs expecting their DC provider to ensure resilience and relying on a single data centre are learning a hard lesson.
The London Internet Exchange, LINX, meanwhile said approximately 150 LINX members were directly affected by the incident. (All of LINX’ devices were restored by 13:42. The organisation has 900+ ASNs connecting from over 80 different countries).
I’ve been assured by an engineer that the new A+B feeds are fed via independent A+B PDUs and then route to separate UPS systems. He mentioned 4 UPSs, so not sure if that’s N+1 on each PDU? Or something else but he seemed to suggest that previously A+B went to a single UPS! 😧
— Matthew Skipsey (@matthewskipsey) August 18, 2020
One customer affected was ISP Giganet. It told customers: “We’re still waiting for our network rack to regain power following Equinix and their contractors migrating power supplies onto the new infrastructure following the earlier fault.
“There is sadly still no estimated fix time which is most frustrating. They have assured us that they will provide this information when they can. Equinix are being continually chased for updates. As you can appreciate this is a P1 issue affecting many 100s of other carriers/ISPs – so it’s been given the maximum priority.”
BT is understood to be among the others impacted. The data centre’s access control systems have been knocked offline by the outage, one customer, Matthew Skipsey said, “so everything [is] running manually over two way radio then phoned through somewhere else. Crazy times. This is a hell of an MBORC.”
Major DC outage ongoing down London way. Equinix LD8. Looks a bit like power systems that were due to be replaced have failed early and are NOW being replaced. A decidedly non-live migrate. Best info seems to be coming from one of their customers here: https://t.co/rZrg2KkXda
— John Leach (@johnleach) August 18, 2020
The co-location is described by Equinix as offering access to “dense concentrations of financial services, Internet service providers, cloud and IT services, enterprises and content and digital media companies.” BT is among those affected.
Today Equinix IBX LD8, in the Docklands, London, UK, experienced a power outage. This has impacted customers who are based there. The outage may have also affected customers’ network services.
— Equinix UK (@EquinixUK) August 18, 2020
Equinix said it is allowing customers “more flexible access to LD8” as it scrambles to fix the issue, “working within our COVID-19 restrictions.”
Giganet added: “We have lost both A+B feeds to 1 of our 2 Equinix LD8 racks at approximately 4.23am. This follows a UPS failure, which then triggered the fire alarm in the data centre according to reports from Equinix. The rack that we have lost power to houses our core Juniper MX router and Cisco LNS. The Juniper MX router is our core device which is needed for everything in LD8 to function, including terminating a number of leased line connections as well as providing connectivity to our vDC platform. All our equipment power suppliers are dual fed with ‘diverse’ A+B power feeds provided by the data centre – however after this incident we suspect that there is a lack of resiliency and will be sure to raise this after the incident is resolved as this is clearly unacceptable to experience a power outage of this gravity.”
LINX lost about one third of their traffic when (part of?) LD8 went down at 3:15 AM UTC pic.twitter.com/rskLLk5Eny
— Giorgio Bonfiglio (@g_bonfiglio) August 18, 2020
Updated: Power appears to have been restored around 19:45 BST.
Are you affected by the outage? Want to vent/share thoughts? Get in touch