Aerospike Connect for Spark offering helps meet growing connected environment needs

Businesses are getting smarter by leveraging as much data as they can as fast as they can. To do this, they’re using artificial intelligence and machine learning techniques in order to automate decision-making and predict behavior. 

Payment processors are slashing fraud, online retailers are increasing purchase rates and financial services organizations are delighting their online customers. These enterprise industry applications have some common data architecture principles that can be adopted by virtually any industry to create smarter, more innovative businesses.

Companies looking to deploy AI for real-world use cases often get stuck at data ingest. Hyperscale and Extreme-Scale both require an entirely new data architecture to deliver both scale and performance without breaking the bank.

In order to meet the requirements, Theresa Melvin, an AI-driven Big Data Solutions Architect at HP Enterprise, who also runs the open-source solutions R&D lab out of Fort Collins, Colorado, has been able to persist millions of events per second, enabling millions of IO’s per second. 

The only solution that can get close with technology hardware and software is Aerospike running on Intel pmem (persistent memory) DIMMs, and that is about 280,000 sustained read and write operations per second – which is about 2,000 percent more than anything else out there.

Designs that Melvin has put together, have to be able to write as fast as they read. A lot of times, she has arrived at a 1 to 10 write-read ratio. For every one terabyte that is inserted, she has to read out 10 terabytes. So, that requires a very special type of NoSQL database, and unfortunately every single database that has been tested over 20 months failed in that regard with the exception of Aerospike.

The Aerospike Connect for Spark addresses the challenges of real-time business insights by leveraging the speed of Aerospike to accelerate complex event processing. Aerospike Connect for Spark provides capability to gain closed-loop business insights by operating on both transactional and stream datasets, and lowers TCO for real-time analytics can be obtained since Aerospike Connect for Spark can operate on larger datasets yet with a smaller cluster size.

Through the use of Aerospike’s Spark connector, users can load data already in Aerospike into Spark for processing. As Aerospike supports queries and indexes, one can load just the data needed for an immediate job.

A practical example would be a user profile store running at high velocity supporting an application that needs personalization, and is using Spark to generate or update models. By adding a secondary index on an Aerospike “bin” (column) for a “last modified time”, users can pull profiles that have been modified recently. As Aerospike supports batch reads, and efficient in-memory row-based access, you won’t have to scan through mountains of HDFS data to find the few profiles that have been updated.

Using Aerospike as the backing store for DataFrame will improve deployability, and enable more efficient work over more data. Aerospike supports flash (SSD) storage, which will allow access to terabytes of real-time storage inexpensively. As this would be a shared DataFrame – supporting atomic in-database increments and more complex user defined data structures through atomic in-database user defined functions, it can be used for multiple computations simultaneously.

A common need for Spark jobs is counters. Whether it’s counting packets by IP address for Fraud detection, or counting in-game clicks for monitization optimization, you’ll need counters. Counters are simple in Aerospike: there is an in-database increment feature, and you can keep a number of counters grouped together under one key: or keep a single 64 bit integer under a single key (use “data in memory” and “data in index” for higher performance).

Last week, Aerospike displayed its solutions at Hewlett Packard Enterprise (HPE) Discover 2019 in Las Vegas. Enterprises looking to deploy artificial intelligence and machine learning-based applications that consume extreme scale amounts of data from edge and core-based systems often get stuck at data ingest. 

Legacy NoSQL database solutions are not designed for hyperscale, and require impractical amounts of data center infrastructure in order to keep up with ever-growing data ingest. They also struggle to deliver predictable performance and cannot meet the strict uptime required for hyperscale workloads.

Aerospike’s patented Hybrid Memory Architecture unlocks the full potential of modern hardware from companies like HPE and Intel to deliver previously unimaginable value from hyperscale amounts of data at the edge, to the core and in the cloud. 

The Aerospike Database powers data-rich applications in the moments that matter. Aerospike data solutions make it possible for customers to instantly fight fraud, dramatically increase online shopping cart size, deploy hyperscale digital payment networks and deliver instant, one-to-one personalization for millions of customers, to name a few.

“Aerospike and HPE are helping a new generation of business innovators rewrite the rules of data infrastructure, eliminating the traditional tradeoffs between speed, scalability and cost for data-intensive, hyperscale applications,” said Bill Odell, chief marketing officer, Aerospike. “Together we are delivering tremendous – almost unfair – competitive advantages to some of the world’s most forward-looking companies who are extracting exceptional value from data at the edge, to the core, and in the cloud at a fraction of the infrastructure cost of legacy solutions,” added Odell.

IoT Innovator Newsletter

Get the latest updates and industry news in your inbox! Enter your email address and name below to be the first to know.