How different SQL-on-Hadoop engines satisfy BI workloads
According to a new benchmark, the three leading SQL-on-Hadoop engines -- Apache Impala 2.3, Apache Spark 1.6 and Apache Hive 1.2 -- all have unique strengths and weaknesses that make them well-suited to some Business Intelligence (BI) use cases and less suited to others.
"The conclusions really are that one engine does not meet all requirements," says Dave Mariani, CEO and founder of AtScale, a startup specializing in enabling BI on Hadoop. "What we have done in our deployments, for our customers, is plug in multiple engines."
For the Business Intelligence on Hadoop benchmark, AtScale set out to help technology evaluators select the best SQL-on-Hadoop technology for their BI use cases. AtScale's testing team used the Star Schema Benchmark (SSB) data set, based on widely used TPCH data, modified to more accurately represent a typical BI-oriented data layout. The data set allowed the test team to test queries across large tables: The lineorder table contains close to 6 billion rows and the large customer table contains over a billion rows.
To read this article in full or to leave a comment, please click here
