Hortonworks aligned with Hewlett Packard Labs, the central research organization of Hewlett Packard Enterprise to form a new collaboration that boosts Apache Spark, one of the most active Apache big data projects.
The collaboration will center around an entirely new class of analytic workloads that benefit from large pools of shared memory. Hortonworks and Hewlett Packard Enterprise plan to contribute the new technologies to the Apache Spark community.
Early results of the collaboration will include enhanced shuffle engine technologies for faster sorting and in-memory computations, which has the potential to improve Spark performance, apart from better memory utilization that offers improved performance and usage for broader scalability, which will help enable new large-scale use cases.
HPE and Hortonworks have had a partnership in place to help organizations realize their modern data architecture. The vendors give users scalable, secure and deployable solutions for Apache Hadoop to solve challenging data storage and processing requirements and to build new kinds of analytic applications. A key component of the partnership is the reseller agreement where HPE can resell Hortonworks Data Platform (HDP), its fully open source Apache Hadoop based platform.
HP has also partnered with Hortonworks to develop the YARN Labels that allow users to create pools of compute nodes where applications run, so it is possible to dynamically provision clusters without repartitioning data. Most interesting is that with labels, users can choose to deploy the Yarn containers onto compute nodes that are optimized and accelerated for each workload.
This results in a more performant Hadoop cluster that was able to get twice the performance in 50 percent of the datacenter space. Clients are also able to scale compute or storage independently, allowing for them to be intelligent on Hadoop cluster design, based on whether their data was accessed frequently (hot data) or not accessed very much at all (cold data).
Due to the disaggregated nature of this architecture, the Yarn Labels feature was able to dynamically allocate only storage nodes or only compute nodes, and be able to specify which nodes they were allocated to.
“This collaboration indicates our mutual support of and commitment to the growing Spark community and its solutions,” said Scott Gnau, chief technology officer, Hortonworks. “We will continue to focus on the integration of Spark into broad data architectures supported by Apache YARN as well as enhancements for performance and functionality and better access points for applications like Apache Zeppelin.”
“We’re hoping to enable the Spark community to derive insight more rapidly from much larger data sets without having to change a single line of code,” said Martin Fink, EVP and CTO, Hewlett Packard Enterprise and Hortonworks Board Member. “We’re very pleased to be able to work with Hortonworks to broaden the range of challenges that Spark can address.”