Big Data means many things to many people, but can commonly be defined by the three vs: volume, variety and velocity. Today, it is not just greater volumes of data that are being created but also a rapidly expanding variety of data sources, and data is being created at ever faster velocity. However, the synchronised acceleration of the three vs is not being mirrored by businesses, whose ability to harness Big Data is evolving in waves.
While each element is equally important, if we review live implementations, many organisations are still stuck in the first wave: finding ways to report on and process the huge volumes of data being created. They are generally not nearly as ready to deal with the subsequent waves: normalising the wide variety of data sources and keeping up with the rapid velocity of data creation to deliver timely insights. So how can businesses seize the initiative and ride on the crest of the Big Data wave?
The First Wave: The data tsunami
The more time goes on, the more challenging data management becomes. According to IDC, 90% of all existing digital data was created in the last two years, and these volumes of digital data will continue to double every two years. This explosion in data sources is forcing companies to think more strategically about how they manage and report on data, as current data warehouse solutions were not built to handle the huge volumes being created. Many Big Data projects are therefore driven by the need to replace or augment existing solutions. Consequently, this ‘first wave’ of Big Data maturity is often focused on simply trying to find ways to tame and report on the huge volumes of data and manage the influx.
Hadoop remains at the core of these projects, enabling businesses to store and manage vast quantities of data on commodity hardware so that they can store more at a lower cost. This focus on volume supports most of the Hadoop-centric investments being made today, driven by vendors such as Hortonworks, Cloudera and MapR vendors. These technologies offer a clear business case for reducing costs and helping to plan for the management and storage of the volumes of data, making it easier to get projects off the ground.
However, historically, many have struggled to get past this wave. Instead of thinking strategically about how they can derive insights from the data, the focus is on storing and collecting. As a result, many are struggling to then gain the business intelligence they need to truly benefit.
The Second Wave: data diversity
While many companies will have been ingesting different data feeds early on for reporting purposes, many have not taken it to the next level to understand how variety and volume together can give greater context to business problems; this is where the second wave of data maturity comes in. Today, we are seeing the variety of data expand daily: the Internet of Things, smartphones, social media and video; many of these data streams did not exist five years ago, yet they all impact on the network and provide vital information for the business.
According to IDC, 90% of all current digital data is unstructured, meaning it often comes in incompatible formats, making it difficult to correlate and integrate into traditional analytics solutions. As we see increased growth in the use of sensor data from connected devices and the Internet of Things, this variety and resulting complexity is set to increase alongside the volume. Not only this, but all of these different data sources create a lot of noise; not all data is created equally, some has more value than others. Those who reach the second wave of maturity recognise this and start to question the data at the source, using streaming analytics to determine what data to store and analyse and what to ignore; essentially sifting for gold so that they only bring the relevant data to the table.
The Third Wave: Surf’s up!
The final wave of data maturity throws the third v into the mix: velocity. While many companies are collecting and reporting on data, and others have moved on to starting to stream analytics at the edge in order to sort through the variety of data being created, there are few who are able to do this in real-time. When considering the question of velocity, businesses should be thinking in terms of how quickly they can turn data into insights, and go one step further to embed analytics into business processes. By taking analytics to the next stage, those riding the velocity wave have actually started to create triggers in order to automate actions, helping to speed up business processes: we are at the stage now where this next wave is breaking.
For example, a key justification driving adoption of Apache Spark in place of Hadoop is that it allows businesses to combine the volume and velocity behind the Hadoop batch-centric processing in real-time. As we see developments in Software Defined Networking (SDN) and Network Function Virtualisation (NFV), the question of real-time automation underpinned by operational intelligence provided by Big Data tools is going to be critical.
On the crest of a wave
As the market matures, we will start to see businesses riding from one wave to the next: that’s when things will really start to get exciting. Arguably, the next wave will be when we start to see more machine-driven control loops and predictive analytics, helping to push us closer to a data-driven, value-centric future. The big swell is coming; businesses looking to avoid a wipe-out should get their boards at the ready!
By Ben Parker, Principal Technologist at Guavus