While no one should be happy when a critical, disruptive system goes away, we shouldn’t assume that the source of the problem is cyber-crime.
Except that often, probably most of the time, I’m wrong.
While it’s tempting to imagine shadowy individuals, cybercrime gangs, and state-sponsored groups behind technological problems, the reality is very different. And that difference between perceived risks and real risks is becoming a problem.
Let’s take the recent outage of British Airways’ IT systems that left lots of flights up in the air – or rather grounded – across the bank holiday weekend. From an early stage, the company was at pains to reassure the public that the outage wasn’t caused by cyber-attack. Why? Because increasingly that’s the assumption. After all, we are all only too aware of the large number of people out there who professionally do attack systems, often with significant success.
The problem is that this perception distorts reality precisely when we need to be very clear on where the real problems lie. Of course cyber-vandalism occurs, but what we need to worry about are the people who don’t break stuff. They just steal it all.
Attackers, the real kind rather than the Hollywood stereotypes, are rather less interested in smashing the jewellery store window and rather more interested in moving in and owning the whole place. In many cases, attackers have been shown to not only sneak quietly into IT systems in order to steal information, but to actively improve security once they’re in, closing known vulnerabilities and generally giving the place a tidy up. Why? Because cybercrime, like any other business, is competitive. Once attackers gain entry, the last thing they want is some competing group showing up and muddying the waters.
The reality is that modern IT systems are complex. While businesses work hard to make them as robust as possible, that complexity introduces an element of fragility and unpredictability that requires careful management and can, sometimes, throw a spanner in the works.
So systems can, and will, fail. Murphy’s Law applies to technology every bit as much as it does to any other human endeavor. The best approach is often to build in as much resilience as possible. That is, to assume that a system will fail, but make sure the rest of the infrastructure can pick up the load, and work to have the original bounce back as fast as possible. There’s even a school of thought that says systems which haven’t failed for a while are actually a sign of impending disaster simply because there isn’t enough experience in working around the problem.
Netflix, for example, has elevated this thinking to an art form by creating their famous Chaos Monkey. This is a piece of software that roams the virtual halls of the Netflix infrastructure and randomly turns things off. This means that they spend all day, every day, getting better and dealing with system outages because, well, systems go out. A lot. It’s like IT vaccination. If a company can deal with the smaller doses of disease, then they’ll be ready to fight off the big one.
So while no one should be happy when a critical, disruptive system goes away, we shouldn’t assume that the source of the problem is cybercrime. It’s far more likely that something fragile has finally broken, and with luck, won’t break again for a long time. That way we can stay focused on dealing with the very real examples of data theft and cyber-crime we see every day in a way that is helpful and constructive. That, and let the Chaos Monkeys out there do their job and make everything just that little bit more resilient for us all.