Blog posts tagged
"big_data"

2 posts


robgibbon
15 October 2024

Apache Spark 4.0 beta release – try it now

Article Data Platform

Apache Spark is a popular framework for developing distributed, parallel data processing applications. Our solution for Apache Spark on Kubernetes has made significant progress in the past year since we launched, adding support for Apache Iceberg, a new GPU accelerated image using the NVIDIA Spark-RAPIDS plugin, and...

robgibbon
15 October 2024


robgibbon
15 July 2024

Deploying and scaling Apache Spark on Amazon AWS EKS

Article Data Platform

Move over Hadoop, it’s time for Spark on Kubernetes Apache Spark, a framework for parallel distributed data processing, has become a popular choice for building streaming applications, data lake houses and big data extract-transform-load data processing (ETL). It is horizontally scalable, fault-tolerant, and performs...

robgibbon
15 July 2024