What is Cassandra and why are big tech companies using it?

Tags: Database , NoSQL

This article was last updated 2 year s ago.


(Canonical recently added Apache Cassandra to its Managed Apps portfolio, based on customer demand. Want any cloud app managed? Reach out to Canonical now. You can also watch this webinar on how Managed Apps will help you maintain focus on your business, for more information.)

It’s no secret that organisations have a love-hate relationship with data. Decision making can be unguided and market insights can be lost when organisations collect too little data. On the other hand, with large and active datasets, where requests number in the hundreds of thousands, maintaining database performance is increasingly difficult.

One open source application, Apache Cassandra, enables organisations to process large volumes of fast moving data in a reliable and scalable way. That’s why companies like Facebook, Instagram and Netflix use Apache Cassandra for mission-critical features. Let’s look at three major benefits, challenges and use cases of Apache Cassandra, and the easiest way to get it running in production.

What is Apache Cassandra? 

To start with, a short overview – Apache Cassandra is a database that focuses on reliable performance, speed and scalability. It quickly stores massive amounts of incoming data and can handle hundreds of thousands of writes per second.

Cassandra allows organisations to manage large amounts of data quickly – enabling the below benefits for its users.

Top 3 benefits of using Cassandra

Performance – Speed

Specific architectural choices make Cassandra a beneficial technology for processing data and at a faster pace than database alternatives. There are two ways Cassandra achieves a fast speed:

  1. It makes quick decisions on where to store data using a hashing algorithm
  2. It lets any node to make data storage decisions. This eliminates the need for a centralised “master node” that needs to be consulted on storage decisions.

Scalability

Cassandra is highly scalable and you can increase performance just by adding a new rack. First of all, there is no “master” that needs to be super-sized to handle orchestrating and managing data. This means all the nodes can be cheaper, commodity servers.

Second, it achieves scalability by putting less emphasis on data consistency. Consistency typically requires a master node to track and enforce what consistency means either based on rules or previously stored data.

Finally, it uses peer-to-peer communication, with the brilliantly named “gossip protocol”. This lets nodes communicate and pass metadata among themselves, which makes adding new nodes very easy.

Reliability – data replication and HA

Finally, it’s a robust store of data and the hashing algorithm stores data as well as makes copies and stores them in other locations. This means if a node goes down – and Cassandra makes the realistic assumption that at some point, a node will go down – there is a copy of it.

Relaxing consistency achieves this. Traditional databases need to be very thoughtful (and slow) in replicating data as there needs to be a plan on how to make sure different copies are up to date.

Fast, scalable and reliable – Cassandra can modernise your cloud

Challenges of using Apache Cassandra

Speed, scalability and robustness come at a price. Availability is chosen over consistency by Apache Cassandra so it is possible for data to contradict. While it attempts to validate data over time, it can be slow to do so. This slows down reading data already stored. The database needs to go through all the data it has stored, including multiple entries of the same data that might contradict.

Why use Apache Cassandra – modernise your cloud

Outlined above are some benefits and challenges of Apache Cassandra but how does it fit into your infrastructure? Here we have outlined common use-cases:

  • Time-series data: Cassandra excels at storing time-series data, where old data does not need to be updated. An example is log files from cloud infrastructure and apps. There is little need to change a log once it has been stored. If it’s incorrect, it is simpler to read the new, correct version and store that with a newer timestamp.
  • Globally-distributed data: Geographically distributed data where a local Cassandra cluster can store data and then reach consistency at later points. As it has no “master node” and as it can be scaled using commodity storage, this allows cheap, geographic expansion of the database
  • Network costs are high: Cassandra can be cost effective when network (e.g. moving data around data centres) costs are high because it does not need to keep sending data to a far-away master node.

Organisations can modernise their cloud and adapt the way data is processed and stored with Cassandra. This lets you manage massive amounts of data worldwide.

Standardised, secure and scalable deployment with Managed Cassandra

In our experience deploying and managing thousands of OpenStack and Kubernetes-based clouds for customers, we found organisations organically adopted Cassandra, on an ad-hoc basis. Passionate, self-taught users usually implemented the database.

Organisations with ad-hoc Cassandra deployments may have multiple clusters of varyingly maintained Cassandra – creating a fragmented view and usage of this technology. Updates, security patches and bug fixes may not be applied to ad-hoc deployments and so IT security may be compromised.

Canonical can provide a unified and robust deployment of Apache Cassandra across an organisation. With our Managed Cassandra service your cloud is secure because security patches, updates and bug fixes are applied across the IT estate.

Further, deployments by self-taught enthusiasts may not be appropriate for enterprise scale and production use-cases. Any mistake made in a production deployment is a potential attack vector and so organisations need a security first approach.

With the fast moving nature of open source it is difficult to ensure a robust deployment. Canonical’s Managed Cassandra lets organisations consume open source, with the lowest risk. Our experience and contributions across the open source community lets you remain cutting edge and secure.

Apps running on any conformant Kubernetes, the public clouds, on-premise or OpenStack, can use Canonical’s Managed Apps. By offering Managed Cassandra and 10 other apps, and supporting any cloud, we can be the single vendor who ensures your cloud is working correctly.

Summary

Apache Cassandra lets your cloud reach “hyper-scale”. It provides practical solutions to get the speed, scale and availability needed for hundreds of thousands of data writes per second. To make sure your deployment is secure, always up to date, and consistent across your organisation, Canonical and Ubuntu app engineers can help.

To learn more about Canonical’s Managed Apps offerings, check out our latest webinar.

Ubuntu cloud

Ubuntu offers all the training, software infrastructure, tools, services and support you need for your public and private clouds.

Newsletter signup

Get the latest Ubuntu news and updates in your inbox.

By submitting this form, I confirm that I have read and agree to Canonical's Privacy Policy.

Related posts

How does OpenSearch work?

How does opensearch work? OpenSearch is an open-source search and analytics suite. Developers build solutions for search and more!

Charmed MongoDB enters general availability

Charmed MongoDB, an enterprise solution for MongoDB® that comes with advanced automation features, multi-cloud capabilities and comprehensive support.

Running MongoDB on Kubernetes

Running databases like MongoDB in public, private and hybrid environments provides multiple benefits. Kubernetes provides the additional advantages of...