IBM’s big data expert Lauren Walker talks to CBR about what and who is really needed to make Hadoop work.
Big Data is a challenge that has been widely addressed and tackled within the technology industry thanks to analytics software.
Many businesses looking to gain more insight from their data have successfully deployed programming frameworks, such as Hadoop, alongside their databases to draw significant meaning from the information they aggregate.
But naturally, with large amounts of unstructured data comes the issue of data that is overlooked by standard analytics practices.
This has more recently been recognised as dark data: information assets that are collected and stored by businesses, but not put to effective use, such as log files and customer call centre records.
According to Lauren Walker, big data and analytics leader at IBM, this dark data is where businesses should be shining a light.
"I’ve been covering this space for about three and a half years now and in that time, nearly every single project that has netted any value has brought that dark data in with the known data," says Walker.
"Coupling information, such as call centre or email information with what I already know about my customers gives me a richness of sentiment and behaviourial insight is. This helps me to make better decisions or to better segment them in terms of whether they would churn or be loyal."
Walker praises Hadoop for its ability to offer a cheaper and more flexible way to manage data, but says now it’s about more than just big data: you have to look at analytics in all your data including fast amounts of data in motion and of the dark data.
"I think what we’ve seen is Hadoop is not a panacea and people are starting to realise that. I have customers who have put [Hadoop] in place and expected like magic everything to make sense," says Walker.
"I think in some ways that is a fallacy because Hadoop gives you a lower cost environment to combine the dark data and the known data, but it’s really people who look at that and start making sense of what all this new data means together using business insight and analytics."
People are the answer
Walker says that the way to solve the issue of driving insight from dark data is to find these people: the data scientists to work the magic on the Hadoop technology to bring out the insight that no one else saw before.
This is not necessarily achieved with one key data scientist, but with a team. IBM has been working with Clear Returns, a company that provides predictive intelligence technology for retailers. Its CEO Vicky Brock employs psychologists, sociologists, mathematicians and statisticians as people in her data science team because they all bring an empirical but broad perspective on data. She also hires specialists in fashion because Clear Returns’ work is focused on fashion retail and how to reduce the rate of return to benefit profit margins.
"So Clear Returns has an ensemble of people who have a mix of the understanding of the industry, the technology and how people make decisions who comprise its data science team. Then when they are applied to Hadoop, NoSQL or even traditional data warehousing: that’s when you really get the magic," says Walker.
This also challenges this industry definition of what a data scientist is. There are sub categories within that role of data engineers who are more technical and the data artist who is more of an anthropologist.
"I think from a technology perspective, people are looking at data scientists as technologists who can write really cool new code, or people who know machine learning and statistics. Then you have the other people who say there are data artists, and they’re the ones who can be creative about how to connect different data elements and tell the coders what to code."
In turn, this code can then be combined with platforms like Hadoop to provide that insight to bring the dark data into the light and implement business decisions.
"Hadoop is an extension of what we’ve been doing for years in relational databases: it’s essentially abigger pot that allows us to use new technologies and functions that were never allowed before. But without the people, some of the projects are never going to get there."