Understanding HPCC Systems® and Spark - A Comparative Analysis
Since its beginning, HPCC Systems has given its users a platform consisting of a single homogenous data pipeline. This significantly minimizes the amount of effort users spend on platform management, installation, and maintenance. Perfect for both data lakes and warehouses, HPCC Systems is extremely capable and efficient in processing large amounts of data due to an architectural design that leverages two specialized clusters, named Thor and Roxie, to manage and optimize the platform’s various functions.
This paper serves as a comparison between the architectures and feature support of Spark and HPCC Systems in regard to data lake capabilities.