“Running the biggest baddest workloads on the Internet”
Apache Cassandra, the distributed NoSQL database, ranks highly in the “most dreaded” database category of Stack Overflow’s annual developer survey.
That’s despite the open source database’s undeniable utility and resilience, as well as widespread adoption by companies including Apple and Netflix.
(Unlike many databases with their primary/secondary architecture under which the latter can only perform read operations, in Cassandra, every node is capable of performing read and write, making it easier to scale and replicate workloads across geographies or hybrid environments by adding clusters).
Now an Apache Cassandra 4.0 beta has landed — the last full release was in 2015 — with over 1,000 bug fixes that may just drive it into the sunlit uplands of “most loved”; or at least stop it keeping company with IBM DB2 and Couchbase. More importantly, it’s up to five-times faster, says Netflix, and comes with a host of welcome new features.
The Cassandra community describes it as “battle-tested” and says there will be no breaking changes before it goes GA.
(Cassandra 4.0 has seen software, hardware, and QA testing donations from the likes of Amazon, Datastax, Instaclustr and island).
Patrick McFadin, who heads up developer relations at Datastax, a Cassandra specialist and lead contributor to the open source database, told Computer Business Review: “The past few years weren’t spent waiting and watching. This is the product of running the biggest baddest workloads on the Internet. The primary goal is to make Cassandra allergic to data loss under any circumstance.
“Cassandra 4.0 release will be the most stable database ever. Many large companies will be running 4.0 in production before it goes GA most likely. Why? Because they want to believe in it before they put their name on it.
He added: “This is what a real OSS database looks like.”
Cassandra 4.0: What’s New?
“Globally distributed systems have unique consistency caveats and Cassandra keeps the data replicas in sync through a process called repair. Many of the fundamentals of the algorithm for incremental repair were rewritten to harden and optimize incremental repair for a faster and less resource intensive operation to maintain consistency across data replicas,” Datastax notes.
The beta release includes “Zero Copy” streaming functionality, which the DB’s contributors say makes it 5x faster without vnodes compared to previous versions, which means a more elastic architecture particularly in cloud and Kubernetes environments.
As one Netflix contributor puts it on the Cassandra blog: “[When it comes to] Mean Time to Recovery (MTTR) — a KPI that is used to measure how quickly a system recovers from a failure — Zero Copy Streaming has a very direct impact here with a five fold improvement on performance.
“Zero Copy Streaming is [also] ~5x faster. This translates directly into cost for some organizations primarily as a result of reducing the need to maintain spare server or cloud capacity.
“In other situations where you’re migrating data to larger instance types or moving AZs or DCs, this means that instances that are sending data can be turned off sooner saving costs. An added cost benefit is that now you don’t have to over provision the instance. You get a similar streaming performance whether you use a i3.xl or an i3.8xl provided the bandwidth is available to the instance.”
Other improvements include a new audit logging feature, a new fqltool that allows the capture and replay of production workloads for analysis, replay, fuzz, property-based, fault-injection, and performance tests on clusters as large as 1000 nodes. Hundreds of real-world use-cases and schemas have been tested.
The curious can visit the Apache Cassandra downloads site or pull the Docker image.