Staff Software Engineer, Observability Group, Atlassian

Big data engines like Hadoop and Spark work well when running on homogeneous clusters. This allows the underlying resource manager to optimally place tasks on the nodes and also lets users tune their jobs as per the configuration of a single machine type. While this setup has performance benefits, in many use cases it can also lead to higher total cost of ownership (TCO) because of the inability to pick and choose instances (for a given cluster) based on the workload and the availability/cost of different instance types. Heterogeneous clusters automatically mix and match instance types based on the availability and cost, leading to significant cost and, in some cases, performance benefits.

Big data engines like Hadoop and Spark work well when runnin

Heterogeneous big data clusters in the cloud

Big data engines like Hadoop and Spark work well when runnin