“There are three key considerations that we advise our customers to make to remove single points of failure…”
The summer of 2018 brought us some of the biggest airport glitches in recent history. An American airline blamed its ‘computer systems’ for cancelling more than 2,000 flights, a European city was forced to shut its airspace and a UK airport saw all the flight information screens fail on one of the busiest days of this summer, writes Jamie Adkin, VP EMEA, Adder Technology.
Despite the staff doing their best to cope with these problems, these incidents were a huge blow to the airports’ brands. Reports vary as to the number of passengers that missed flights last summer due to technical glitches, from hundreds to thousands, but if the latter is true then there is obviously a significant financial overhead as well as a dented image.
To the consumer, last summer may now feel worlds away, with the World Cup, the heatwave and these glitches just a distant memory. However, for airports, it’s now crunch time. Last year, Friday 27 July was pegged as the busiest day of the summer, with 8,841 flights scheduled in the UK in just 24 hours.
As we know, holidays – and thus flights – increase when there are bank holidays, and we have five coming up between now and the end of August. Understandably, summer is a high-pressure time for the travel industry, and after last year, many are feeling the strain. In airports, the source – and saviour – of technical glitches often lies in the control room, which opens the debate of how organisations can prepare for these failures.
Prepare for Component Failure
Lately, the emphasis has been on protecting control room infrastructure against hackers, cybercrime and terrorism, with strong measures put in place to combat these major threats. However, we should never underestimate the need for physical resilience and reliability even in the face of often unavoidable malfunctions.
When it comes to looking after your customers, it’s time to go back to the basics. No matter how tough or durable a system is, most people accept that it will fail at some point in its lifecycle – whether through an internal weakness or because of a power outage. Single points of failure are parts of a system that can cause the entire system to stop working, should it fail. While these are undesirable, in many cases, eliminating a single point of failure is impossible.
Increasingly, organisations are recognising that they must have built-in contingency measures to address this eventuality. Total shutdown is unacceptable because of the chaos and danger it could cause, particularly in high-pressure sectors like consumer transport. Therefore, it’s best for organisations to either isolate the failure or invest in more reliable equipment.
With this in mind, here are three key considerations that we advise our customers to make to remove single points of failure:
Management or Automation Systems
A server failure could lead to a large proportion of your systems becoming inoperable. The obvious solution is to ensure that multiple manager systems are installed but it is also important that you fully understand the failover process and the limited capabilities that may be imposed when operating on a back-up server. The best solutions will enable the secondary manager to take over instantly and automatically without human intervention. If this is possible, the control room will not lead to even greater disaster.
Prepare for Power Supply Failure
To protect against this possibility, ensure you have a power distribution unit with multiple power supplies and multiple sources of power. It’s critical that if a source of power is lost, a secondary source is available to all devices. For example, a control room could switch over to a generator. The ideal solutions are load balanced between source A and source B when the system is running perfectly so that if one of those power sources disappears, the 2nd power source will take over the full running of the system immediately
What Happens if the Network Fails?
We rely on networks for almost every aspects of our lives and while they are amongst some of the most reliable and resilient technologies available, we must accept that something will go wrong at some point. If a network switch fails, it could be catastrophic if you have not designed a robust, resilient network architecture.
In the control room, a network failure could lead to a lack of data, incorrect data visualisation or a reduction in communication, all of which could put people at risk. Of course, network engineers can design resilient networks that deal with these failures with limited downtime and network vendors provide outstanding levels of support to their customers. There is however a further strategy to reduce downtime to zero.
Technologies designed for mission critical environments enable connections to two different networks so that even in the event of a network switch failure, there would be an instant failover to the secondary network. Just like the power resilience above, the very best solutions will be automatic & data would load balance so that the operator experiences zero downtime.
Any organisation dealing with the movement of people where crowd control is paramount, from a transport hub to a football stadium, will know that a system collapse could have dire consequences. In order to prepare for peak times, it’s vital to check your systems have the resilience and redundancy to cope with a stoppage. While this may feel like going back to the basics, the simple fact is that well-thought out contingency plans will save your organisation money – and help it hold onto its reputation. The organisations that prepare to fail will succeed this summer.