“Several elements of the ground infrastructure were re-initiated”
“Have you tried turning it off, and turning it on again?” A week after Europe’s €10 billion Galileo GPS system suffered a still-unexplained outage, the European Global Navigation Satellite Systems (GNSS) agency – in a grudging update and under intense questioning from global media outlets – admitted that it had done that much.
Galileo, a constellation of satellites that provides global positioning services to over 400 million users, is the first European infrastructure explicitly owned by the EU. It failed in the early hours of Thursday morning, July 11. It is still down. (Users have been automatically switched over to alternatives; the US’s GPS or Russia’s GLONASS).
“Several elements of the ground infrastructure were re-initiated… it is too early to confirm an exact service recovery date” the GNSS said late Tuesday. Unfortunately, in this instance, the tried and trusted reboot does not appear to have worked.
The GNSS is responsible for Galileo’s operation, alongside the European Space Agency (ESA) and European Commission. Unsubstantiated reports link the outage to software issues at the Precise Timing Facility at Fucino, Italy. The latest update provides few technical details and links it, again, merely to “Galileo ground infrastructure”.
Galileo Still Down: Elements “Re-Initiated”
Nearly a full week after the outage, however, the GNSS has published the most detailed update thus far, in an apparent bid to reassure users that efforts to fix the issue are underway. The GNSS said on Wednesday 17: “As soon as the incident was declared, an Anomaly Review Board was convened and urgent recovery procedures were activated in the affected Galileo infrastructures.
“Operational teams are working on recovery actions 24/7 to restore the Galileo navigation and timing services as soon as possible.”
The communiqué notes: “The progress is being closely monitored; it is too early to confirm an exact service recovery date”. The agency adds: “It was precisely to deal with issues of this nature that the EU opted for a progressive roll-out of the Galileo system. The evolution and planned upgrade of the ground infrastructure will reinforce redundancy of the system towards reaching the full operations phase.”
Redundancy appears to have been baked in, but also failed: there are two Galileo control centres: one in Oberpfaffenhofen, Germany and one in Fucino, Italy.
The “fully interoperable” centres are complemented by a worldwide network of sensor stations providing orbitography and synchronisation measurements; stations that uplink the navigation data; two telemetry, tracking and command stations controlling the constellation; and a network connecting all the ground facilities.
Unnamed Sources Blame Thales
The European Commission is formally responsible for Galileo. ESA acts as design and procurement agent on its behalf, and the GNSS agency (GSA) ensures Galileo’s uptake and security. It also works with private sector partners.
One is French multinational Thales, and an unnamed source told Inside GNSS that the company was responsible for, if not the issue, then fixing it.
“There are major architectural problems within the GMS [Galileo Mission Segment] under Thales Alenia Space…responsibility,” they said.
Computer Business Review has contacted Thales for comment. It is majority partner in a joint venture with Leonardo with responsibility for GMS maintenance.
(Thales late 2018 also won a new €130 million contract that includes “adapting the ground segment to handle, via the Point of Contact Platforms, the interface between the Galileo security center (GSMC) in charge of managing PRS access to the new system; government entities that control access on their own receivers.”)
Redundancy, Reliable Connection Long a Concern
Reports going back as far as 19 years highlight the importance of redundancy for the Precise Time Facility (PTF): “Concerning the PTF design baseline, the redundancy mechanisms and a reliable connection between the PTF and the GTSP [external Galileo Time Service Provider] are considered to be more critical than the Galileo System Time generation and steering algorithms, where valuable experience exists worldwide”, a paper by industry experts published in 2005 notes.
A GNSS paper gives a flavour of the complexity of the issue: “The application at the heart of the GTSPF [Galileo Time Service Prototype Facility] has, at its core, software containing innovative, complex and unique algorithms.
“The most sophisticated technical development is focused around two elements which contain the complex mathematics used to provide the time service.
“The first element is a composite clock to establish a highly accurate and stable virtual timescale using the input of an unlimited number of atomic clocks.
“The second element contains prediction and steering algorithms which generate the steering corrections needed to maintain a highly aligned and stable Galileo system time. Both these elements incorporate state-of-the-art Kalman filter-based algorithms, whose development has pushed beyond that which was previously possible.”
It is somewhere here, sources suggest, that things have gone awry in the code base.
Explaining the Chaos: Lessons from the Commercial Sector?
Irrespective of the complexity of the problem, when it comes to updating users, some would argue that the agencies could learn a thing or two from the commercial sector: a recent Cloudflare outage that took down chunks of the internet resulted in the CEO calling swathes of customers and press personally within 30 minutes of the outage to explain efforts to resolve it. A detailed post-mortem was published days after that.
The hyperscale cloud providers also – to mixed degrees – publish detailed analyses of issues as they unfold, the efforts made to resolve them, and post mortems thereafter. Any institutional embarrassment about the outage, or warts-and-all revelations – “it’s using what software?” – as a result are typically offset by the appreciation with which such transparency is greeted by customers.
Whether the EU will take heed of that approach, and when a fix is likely remain open questions. There will be some work to do to reassure commercial customers ahead of full system launch in 2020 however.