COO of Cloudera talks to Ben Sullivan about Big Data and the evolution of the data warehousing landscape.
You joined Cloudera in 2011. What attracted you to the company?
I had been running an archival storage company that I had sold 9 months prior and what struck me in the process of building that company was two very important trends; one obvious, one not so obvious and the combination absolutely explosive!
First, we all know that storage is continuing to grow at a fast clip. We often talk about structured and unstructured storage, but increasingly that distinction is fading as we’re realizing that data is data and all of it has value.
Second, from the past 30 years of database and data analytics, we actually prosecute only about 10% of the data that we store. Now we know that if we store 100% of the data and analyze 10%, something is wrong with this picture.
In parallel, we have watched the new social networking companies build their businesses on the notion that all data has relevance and should be considered when making decisions about customers, products and markets. This should alsoapply to consumer products companies, retail companies, investment banks, mobile phone operators and the like. Doesn’t their customer’s activity inform them as to what is important or interesting to them? And at an increasingly short time interval? Maybe applying the same approach toward data analysis that the social networking companies do to determine what is important would also apply to these corporations conducting data analysis for decades. We now know it does and in fact the data analysis approach that has been on going since the advent of the database could now be augmented by this new, scalable and agile technology called Hadoop.
Cloudera’s vision to allow any company, big or small to be able to capture and interrogate their data scalably, flexibly and cost effectively is exactly what the market has been waiting for. The founders of Cloudera foresaw this need in the enterprise from their experience in data analytics and their practical application in social networking. I heard the story, had seen the impact and pressures on all corporations to become more data driven and to become more intimate with their markets and customers for years so to me this was an "oh duh!" moment. I think we all get that now.
What are the main challenges you’ve faces as COO at Cloudera and how have you been overcoming these?
The market has been growing faster than any market i have seen in my 25 years in technology. No one wants to miss out. Our challenges are twofold. One we control and one we can influence. First, there is a lot of noise in the market about what Big Data is and how companies can benefit from its use. Cloudera has a 3 year head start on the market and we have built up a base of hundreds of customers that give us the experience and insight to help our prospective customers to understand the real benefits of the technology and our existing customers how to get maximum return on their investment. Effectively, we spend a lot of time educating the market at large helping find the signal in all the noise.
The second is simply our execution in building products and delighting our customers. We are increasing our investment in the build out of our customer facing organization and support. Additionally, we are continuing to invest in being the most innovative company in the Hadoop ecosystem. We continue to lead the market in our commitment and contribution to the Hadoop open source platform. We also are increasing our investment in building products aimed at real business problems. Our investment in solution based products like management tools, SQL access to Hadoop, Search, Back Up and Disaster Recovery allow our customers to extend the use of their Hadoop Platform and apply it to real business initiatives.
Cloudera recently released Cloudera Search. What can this do for companies and how does it fit in the Cloudera portfolio? How has it been received so far?
Search and Big Data are synonymous. If you have a lot of data you likely want to index it and then have a mechanism to find valuable nuggets in your data. Think about searching for a class of data to create a certain segmentation of what you have and then run analytics against that segmented result. How about running analytics against segments. For example, for a retailer you may want to know how many times a customer visited your online store and searched for a certain product and then whether they in fact bought that product either in the store or online? Search provides a way of refining your analytic approach and thus increasing the fidelity of the answer.
According to the Gartner Hype Cycle, big data is currently falling into the trough of disillusionment. What do you say to that?
Bunk! I would say this is related to the signal to noise problem. There is a lot of talk about big data and very little that have tangible experience in deployment of enterprise environments. Because so many companies are claiming "big data offerings" it appears that the market may be over hyped. It is not. There are hundreds of organizations that have deployed Cloudera in large scale, mission critical environments very successively and we’re seeing the expansion of these platforms grow rapidly. In fact, the most exciting thing is the workloads that we’re seeing get deployed that were simply not possible before. Recommendation engines, sentiment analysis, broad machine data insights are now changing the way companies see their customers and markets and these are just a few that were either too complex, too expensive or too slow to deploy before. It is true that we are at the beginning of this new and exciting market and I suppose the rhetoric from companies that don’t have many if any customers to point to can make the market seem over hyped, but I would say this market has proven itself with tens of thousands of nodes in production and is tracking probably faster than earlier major trends like, client server, IP networking, and virtualisation.
How is the data warehousing landscape evolving and what do you think is in store for the future?
Data warehousing is a very important market, serves a very important function and will continue to do so. The best way to think about big data, like any new market, is more about what problems are being solved that could not have been solved before. Certainly like in any new technological development there are some overlaps, but those are slight in the beginning and can and likely will grow over time. What we ask our prospects and customers to think about is the business problem they’re trying to solve and to apply the very best solution that meets the needs of the business. The next layer down, as I understand that sounds very general, is to look at the workloads within the business initiative that need to be addressed. So we think about the data, the workloads, the insight from the data that can be derived and how we can help all organizations do that more flexibly, more cost effectively and more quickly. In fact our marketing message is to ask bigger questions not to ask the same questions with a different tool.
According to Sohaib Abbasi, CEO of Informatica: "Each one of those vendors is making some remarkable claims, that they can do it all. You can have all the analytics you want. I just do not think that in the world of analytics one size fits all." To what extent do you agree?
The big data market is segmented much like the current data management market. Platform, integration & tools and Business Intelligence are still the key elements of data management and they will continue to be. I don’t see any company today that has yet become best of breed across all of these categories so I would tend to agree with Sohaib. Informatica is our partner and I know Sohaib is a very smart person.
What can you tell us about Cloudera’s progress and financial performance?
We’re growing fast! Keep watching; please!
What plans for development over the next few years does Cloudera have?
We expect to continue to invest in the acquisition of new customers, being the leader and innovator on the platform technology for big data and last, but not least to have our customers tell us that they are thrilled with how we care for them.