Most discussions on Big Data have centred on how to get to the stuff – how to capture it, via Hadoop clusters and/or process and access it using MapReduce for instance. That’s all well and good, but compiling Big Data is the start, not the end, of the process.
In fact, the emphasis on building Big Data datasets overlooks the most important question of all: "What are you going to do with the data?"
The truth is there isn’t a ‘one size fits all’ answer. Why? Because Big Data’s usage depends on your organisation’s specific requirements and goals.
Success lies in recognising the multiple types of Big Data sources, identifying the most appropriate technologies in each case, and then unlocking the riches within. Having successfully organised your data at this stage, you are then in a position to analyse, visualise and operationalise those precious insights according to your unique business aims to get the value (namely, better decisions) that improve your bottom line.
To make explicit what is involved on this Big Data journey, here are the main data source types and the corresponding analysis and visualisation techniques that might be applied to find the Big Data "gold" you’re after.
User profiles from social networking sites, search engines or interest-specific social sites may be mined for individual profiles and target group demographics. Technology wise, this involves API integration.
Another potential and increasingly influential data source comprises contributions from reporters, analysts and subject experts to articles, user forums, blogs, Twitters etc; also user feedback from Facebook, catalogue and review sites; plus user-review-based sites like Amazon and so on. The mining technique has to involve Natural Language Processing and/or text-based search to assess the evaluative nature of comments and derive usable insights.
The next big source area is activity-generated data from computers and mobile logs. Also – and increasingly so – data generated by processors within vehicles, video games (soon, household appliances as the Internet of Things becomes a reality). Here, parsing technologies such as Splunk may well help make sense of these semi-structured text files and documents.
Cloud Data from SaaS applications such as salesforce.com, etc., may require distributed data integration technology, in-memory caching and API integration. There is also a wealth of publicly available data from the likes of Microsoft DataMarket, Wikipedia, etc., that you may wish to incorporate in your Big Data bucket.
These resources require the same types of text-based search, distributed data integration and parsing technologies mentioned above.
Finally, there are all those filing cabinets full of original and only print documents. Parsing and transforming this semi-structured legacy content to prepare for analysis can be aided by specialist document management tools, e.g. Actuate’s Xenos.
We have been talking about sources and the analysis and visualisation techniques that can assist you in your Big Data task. Let’s consider other technologies that should form part of this conversation.
The next-generation Hadoop and MapReduce style tools for handling and parallel parsing of data from logs, Web posts, etc. promise to create new generations of data. Plus, don’t forget that older data warehouse appliances, such as Teradata, Netezza, Plumtree, etc., have been busy for years collecting internal, transactional data. These should all become integration targets for your Big Data architecture.
Meanwhile Cassandra and other packet evaluation and distributed query processing-like applications, as well as email parsers, are also technologies that fill gaps in Big Data environments and will help deliver the goods.
Finally, there are many useful tools such as BIRT (Business Intelligence and Reporting Tools), the Eclipse Open Source project that serves as the foundation for the ActuateOne product suite, that help your Big Data mission.
In conclusion, as an industry we have yet to appreciate that it’s not only how well we capture Big Data, but what we do with it that matters. As ever, "Why do we want to do this?" is the only really interesting discussion business and IT should have; we need to empower both sides to have that conversation about Big Data.
It’s an effort worth undertaking. Imagine life when Big Data starts making its mark. Think weather forecasts that are actually predictive, useful restaurant or accommodation recommendations on your phone when you reach your holiday destination, the fridge that could talk you through a recipe based on contents and meal preferences; or we could start to learn something about the fundamentals of life with all that genome information. With Big Data, the possibilities are genuinely exciting.
It’s time to start moving beyond the enthusiasm and froth to the real business benefits – a process that can only happen with a pragmatic, properly thought-out implementation strategy that takes your business through the organising, visualising and operationalising stages of effective Big Data management.
Nobby Akiha is Senior Vice President of Marketing at Business Intelligence (BI) specialist Actuate; firstname.lastname@example.org