A poorly tested software update followed by poorly performing failover systems was behind the BlackBerry outage of last week, Research In Motion has confirmed.
RIM has determined that the incident was triggered by the introduction of a new, non-critical system routine that was designed to provide better optimization of the system’s cache, RIM said in a statement.
The system routine was expected to be non-impacting with respect to the real-time operation of the BlackBerry infrastructure, but the pre-testing of the system routine proved to be insufficient, the company said.
The caused a compounding series of interaction errors between database and cache. When failover systems kicked in, they did not fully perform to RIM’s expectations and caused the outage to last even longer, the company said.
The outage lasted half a day and left RIM’s millions of North American email subscribers without service. The lack of an immediate reason for the brownout compounded the frustration of many.
The company was keen to point out that it had not been hacked, nor was the downtime a product of poor capacity planning, which were two of the theories doing the rounds.
RIM has been able to definitively rule out security and capacity issues as a root cause, the company said in its statement.