Analysis: Nuix software was used in the Süddeutsche Zeitung, ICIJ investigation to make sense of the 2.6 TB of raw data.
The fallout from the data leaked in the so-called Panama Papers continues to gather momentum. In Iceland, 10,000 people have taken to the streets in protest at PM Sigmundur Davíð Gunnlaugsson‘s supposed links to tax havens, while in the UK PM David Cameron faces calls to answer his family’s tax arrangements following ties to the Panamanian law firm Mossack Fonseca.
The global media storm centres on 2.6 terabytes of leaked data, approximately 11.5 million documents, which reportedly show how law firm Mossack Fonseca helped clients set up anonymous offshore companies.
Surpassing the infamous Snowden data leaks in terms of size, the German newspaper Süddeutsche Zeitung and the International Consortium of Investigative Journalists (ICIJ) faced a huge task in the analysis of the 11.5 million documents.
That data, and the consequent revelations of corruption, would never have gone public if it were not for the great enabler that is technology – specifically Nuix software.
Describing the mammoth task at hand, Carl Barron, Senior Solutions Consultant at Nuix, said: "When analysing the Panama Papers, national German newspaper Süddeutsche Zeitung had to sort through 2.6 TB of raw data – that is 11.5 million unstructured files.
"Faced with the biggest leak journalists have ever had to work with, the challenge for the team was how they were going draw the links between people, companies & locations across all 11.5 million items. To attempt this investigation using only manual workflows would have been improbable, if not impossible."
In short, the 400+ journalists involved in the investigation needed something to make sense of all the data, which was where Nuix, a provider of software which makes it possible to search, investigate and manage unstructured data, came into the fold.
Described by ICIJ Director Gerald Ryle as ‘an indispensable part of our work’, Nuix technology was used to process, index, and analyse the data, with investigators using optical character recognition to make millions of scanned documents text-searchable. Among other tools used, investigators used Nuix’s named entity extraction and other analytical tools to identify and cross-reference the names of Mossack Fonseca clients throughout millions of documents.
Commenting on how his company’s software was used in the investigation, Carl Barron said: "Süddeutsche Zeitung and the ICIJ used Nuix to index all the leaked files into a single platform to ensure a complete investigation — and achieve one window into the data.
"This entailed being able to search through normally un-searchable files such as PDFs, scanned documents, and photos, using Optical Character Recognition (OCR) technology to recognise and extract text contained within images."
"With all the sources in one place, and the ability to use Nuix to run searches and visual analytics across the complete dataset, Süddeutsche Zeitung and the ICIJ were able to uncover the scale of the Mossack Fonseca case and have all the evidence accessible to continue the current investigation or easily run additional investigations to support further stories."
Technology in the Panama Paper scandal has very much been the enabler of the whole story. From obtaining the data, to leaking the documents anonymously, and then to analysing said data for the scandalous insights and revelations – technology has been one of the major constants throughout the entire saga.
What this does give us is an insight into how many other future scandals, revelations and secrets may be divulged and shared with the wider population – as Skyhigh Networks CEO Rajiv Gupta believes, technology will continue to play a starring role in political scandals of the future.
"Political scandal, first through Edward Snowden and now through the Panama Papers hack, has followed bank robbery and espionage into the digital age. Only with online tools could a whistleblower hope to make off with 2.6 terabytes accounting for 11.5 million documents, and could journalists rely on powerful collaboration software to analyse the information. This generation’s Watergate will be conducted through shared folders and chatrooms."