News: The move is part of the company’s plans to use open-source technologies, a larger trend under CEO Satya Nadella.
Microsoft is making a serious commitment to the Apache Spark open source data processing engine to power its big data and analytics offerings.
The company is bringing several offerings out of preview mode and into general release.
Spark for Azure HDInsight has now been made generally available, together with a fully managed Spark service from Hortonworks.
R Server for HDInsight, set for general availability this summer, eases moving code and projects to the cloud, without purchasing hardware or appointing specialised operations teams related with big data infrastructure.
R Server for Hadoop is set for general availability in June. It will support both Microsoft’s implementation of the R statistical programming language and native Spark execution framework.
In a blogpost, Microsoft said the integration of R Server with Spark gives users the ability to run R functions across several Spark nodes.
It allows them to train their models on data 1000 times larger and 100 times quicker than was possible with open source R and nearly 2 times faster than Spark’s own MLLib.
The company also released a new, free Microsoft R client tool for data scientists, enabling them to utilise any of the open source R functions for analysing the data present on their local workstation.
It also allows them to analyse remote big data and scale out the analytics by pushing the computation to a production instance of Microsoft R Server such as SQL Server R Services, R Server for Hadoop and HD Insight with Spark.
Power BI, a set of data visualisation tools, now supports Spark Streaming, enabling users to publish real-time events from Spark Streaming directly into one of the fastest expanding visualisation tools in the market today.