arrow Created with Sketch. Insights Blog

Jun 19 / 2017

Big Data Insights You’ll Want From #DataWorksSummit17: Part 2 (James Anderson, Principal Consultant)

Hadoop_Elephant_DWS17.jpegIn attending Hortonworks’ DataWorks Summit 2017 in San Jose, CA this past week, I was struck by some key differentiators in various breakout sessions and keynotes. What was particularly of note was that the shift in Hortonworks’ toolset has important implications for both developers and executives alike. This is Part 2 in a series of insights for both technologists and decision-makers in companies who have Big Data on their minds. Let’s take a look:

Hadoop: The Application Platform (NOT Just the Data Platform)

In the years since YARN was introduced with Hadoop 2, there have been enormous developments in containerization, orchestration, microservices, Everything-as-a-Service (EaaS) and cloud computing/storage. With the preview of Hadoop 3 YARN assemblies and Hortonworks Data Cloud 2, it is clear that those developments are inducing serious change in the Hadoop platform.

Rather than merely accommodating non-co-located object storage in the cloud, Hadoop is growing to actively exploit the benefits of ephemeral compute and permanent storage to enable cost savings and new styles of multi-tenancy. Rather than just being a data platform with lightweight support for applications, YARN assemblies will allow it to become a full-fledged application platform akin to DC/OS, AWS ECS and GKE/Kubernetes.

Rather than being run on the platform, the goal of the Hadoop community is clearly to become the platform itself, even going so far as to enable running Hadoop-as-a-Service on Hadoop-as-a-Platform. These developments bode well for the ongoing vitality of the Hadoop platform, as various big data and cloud services providers continue to mutually push and adapt to each other’s capabilities.

What this means to you:

We're seeing a proliferation of big data tools and service providers (not to mention novel approaches like serverless/lambda), which, in many cases can facilitate cloud-based workflows identical to what could have only been run on-premises with Hadoop just a few years ago. Suddenly, you have access to flexibility and scalability you never had before.

Hadoop distribution providers like Hortonworks and Cloudera are putting in the features and capabilities necessary to make sure that their products can be the command center for managing the entirety of an organization's big data needs, both on-premises and in the cloud.

Now, your IT organization has an ever-increasing number of choices for how to compose, and where to host, their big data infrastructure. If your organization is relying on Hadoop, your choice of Hadoop vendor will become increasingly important, as cross-cutting operational and PaaS/IaaS capabilities are consolidated directly into the Hadoop platform. Additionally, Hortonwork customers using Data Cloud 2 may soon find their choice of cloud provider has become completely transparent and swappable. Hadoop vendors are working to make sure that your choice of Hadoop distribution is the first and most important choice you make when embarking on a big data journey.

If you're ready to investigate and discover your options for your Big Data planning, reach out, we'd love to talk.