SQL is now possible on the big data platform.
The Hadoop ecosystem lacks the maturity to reap the full benefits from big data analytics as businesses spend too much time integrating different tools and platforms.
That is according to private software company Actian, which has joined a list of companies that have introduced SQL support for the Hadoop’s HDFS file system.
While Actian’s CTO Mike Hoskins praised Hadoop’s ability to offer a cheaper and more flexible way of processing and storing large quantities of data, he said: "It is loudly immature and so you if it’s going to be your main platform, you’re going to want to be able to run SQL on it, which you can’t. You’re going to want to be able to run advanced analytics on it, which you can’t."
Hoskins added: "People are using primitive techniques like MapReduce, so there’s no visual tooling. Everything we’re familiar with such as visual drag and drop tools is typically not available in the Hadoop ecosystem.
"There’s no good SQL database in the Hadoop ecosystem. These are pretty major gaps which we are trying to remediate."
Actian, previously known as Ingres, added SQL support to its visual dataflow framework, which is designed to run natively in Hadoop via the YARN resource-management layer.
lys Woodward, research director for Big Data at IDC, said: "SQL is a widely used query language that has been around for a few decades, whereas the Hadoop query tools, such as Hive and Pig, are far newer. Putting SQL access on Hadoop is about making it available to a much wider set of developers without the need to train them on new skills.
"It’s also about adoption; if developers want both standard relational and big data capabilities, it would make sense to use a platform like Actian that supports both types of usage."
Hoskins said: "This is our first release and so we’re probably going to continue to push the boundaries of scale because some of the data volumes that people are talking about are really of a class that none of us have ever seen before in database technology."
The platform, due out later this month, puts the firm in competition with HP, which introduced SQL-on-Hadoop capabilities on its columnar Vertica database late last year, and Cloudera’s Impala, also released last year, among other companies.
Actian’s SQL-on-Hadoop uses the X100 vector processing engine for parallelised querying on HDFS it claims can deliver results of up to 30 times faster than rivals.
"There’s a formal benchmarking test called TBC-DS, which measures all the databases. In first place is Actian with this vector technology database, the speed and improvement over Oracle and Microsoft SQL service were stunning. It was like a 100% doubling, but the truly amazing thing was that it did it on a much more modest hardware because of the price performance breakthrough," claimed Hoskins.
Mario Meir-Huber, research analyst for big data and cloud at IDC, told CBR: "The question is, how comprehensively does the SQL in the Actian platform support what Hadoop experts do with Hive and Pig; does it allow SQL developers to work like Hadoop experts, or do they just stay as SQL developers working on larger volumes of data?"
Another key part of Actian’s Analytics Platform, according to Hoskins, is that it allows SQL users and business analysts to conduct advanced analytics directly on data in the HDFS.
"Unfortunately the biggest audience of the people who can ask questions of data analysts are SQL users, except that it doesn’t work in Hadoop and that’s a major gap in the market and we’re stepping into fill it," he says.
Analyst firm Gartner estimates the current Hadoop ecosystem market is worth roughly $77m and expects it to grow to $813m by 2016.