Opinion: Dale Kim, MapR Technologies, discusses how Hadoop protects data, but why enterprises must plan ahead to build a secure data environment.
More and more organisations are relying on big data to understand their businesses better, with as many as 70% of IT decision-makers considering it critical to their company’s future success. However, it is increasingly difficult for many organisations to reap the benefits of big data trends, such as BYOD, cloud, and the Internet of Things, which generate significantly greater volumes of data.
Many organisations are now turning to Apache Hadoop, a framework that runs cost-effective, large-scale analysis and processing. But as with any new platform or tool, questions have emerged debating its security and whether it’s fit for production use.
Far from its experimental beginnings, Hadoop has evolved and is now deployed across many different industries with varying use cases. From clickstream analysis, and data warehouse optimisation, to anomaly detection and recommendation engines, Hadoop is used for many diverse applications.
But with fame comes scrutiny. Hadoop has commonly been questioned for its security competence. This is chiefly a mischaracterisation, as Hadoop is already deployed in security-conscious environments. Many Hadoop deployments hold some of the world’s most valuable and sensitive data, from the financial services industry, healthcare, and even government.
As such, instead of debating whether Hadoop is suited to secure enterprise environments, IT teams should work to identify the best approach to their specific environment.
Harnessing Hadoop’s native security capabilities correctly
Hadoop does have its own native security capabilities that follow a similar security mode to other enterprise software systems. Authentication, typically via Kerberos integration, is the starting point for identifying authorized users of the Hadoop data. Implementing access controls allows IT teams to grant and deny permission for particular data sets to individual users, groups, and/or roles.
Hadoop does also support encryption, both the data-in-motion and the data-at-rest varieties. The former capability is to protect data from network eavesdropping, and clearly is an essential part of any secure system. Interestingly, the latter capability is frequently misconceived and wrongly used as a means for access control. Rather, the primary use for data-at-rest encryption is to keep data protected should the physical storage devices be stolen.
Securing data-at-rest can also be achieved through obscuring sensitive elements in files. This allows the data to become non-sensitive, while still retaining its analytical value. Various third-party security vendors that work with Hadoop can manage such encryption, including techniques known as masking, tokenisation, and format-preserving encryption.
Of course, security is a very complicated subject that can’t be fully addressed in a single article. But expertise on security issues on Hadoop is certainly available in the market to help, and that talent pool continues to grow.
Adding advanced security to meet business needs
It’s important that businesses take the necessary steps to protect themselves while harnessing Hadoop’s many benefits.
The right security processes differ between deployment models based on your business requirements. For some, firewalls and network protection schemes are sufficient start to secure a Hadoop cluster and ensure only trusted users can access it. This is implementation at its most basic, not dependent on any security capabilities specific to Hadoop. Extensions of this model could also impede direct logins to cluster servers, with users given access by means of edge nodes incorporating Hadoop’s fundamental security controls.
Organisations looking for a more sophisticated approach can use Hadoop’s native security controls to provide access to a greater number of users, while also ensuring the data itself is only made available to authorised users. Hadoop security capabilities can also be fully deployed conjointly with analytics and monitoring tools on Hadoop clusters for yet more advanced environments, to detect and stop intrusion or other rogue activities.
Companies operating with sensitive data are already deploying Hadoop, legitimising its role in secure environments. But as with all technologies, there is no one deployment method for every IT environment.
Instead, IT teams must consider what is most important and consider which of Hadoop’s features best support these priorities. Hadoop vendors and third party security providers can also help organisations garner further insight to help them conduct a smooth roll out suitable to their requirements.
Hadoop standards: make it what you want
While there are many different options for handling access control when it comes to Hadoop, there is no universal standard for security so professionals have the opportunity to investigate and devise the best process for their individual environment. There are a number of approaches to take, whether build-as-you-go, a follow-the-data path that takes an application-centric approach, or something altogether more data-centric.
This lack of standards should not act as a deterrent for organisations considering Hadoop. With Hadoop already deployed across a vast range of productive environments operating with sensitive data, not only is Hadoop secure enough for the enterprise but its capabilities are set to flourish as adoption grows.
Dale Kim is director of industry solutions at MapR Technologies