Measuring Impala performance using Apache JMeter

5 May '16

Data

With the continuing rollout of access to our datawarehouse via Impala it is becoming more important to understand the performance characteristics with different loads. So we decided to quickly put together some JMeter scripts to inject load. This is surprisingly easy, but also works surprisingly well.

First, like every JMeter project, we add a thread group. This needs configuring to set the number of concurrent queries we want to run. We’ll keep this low to begin with so that nobody shouts at us.

Thread Group

Below that we add some configuration for the JDBC connection by adding a JDBC Connection Configuration Config Element. We need to set the maximum number of connections to the same number of concurrent queries that we set above. We also need to give this configuration a name by adding it to the Variable Name.

JDBC Connection Configuration

Then we can add the query we want to run by adding a JDBC Request Sampler, setting it to use the connection configuration we set above - the same Variable Name.

And the final element we need to add is the Summary Report Listener to give us some timings.

Summary Report

Assuming the Impala JDBC Jars are in the classpath, we can now hit run and wait for the queries to return. If the jars aren’t already on the classpath then they can be added individually from the root Test Plan.

If you’re using DNS round robin to distribute traffic across multiple Impala daemons then you’ll probably notice that all of your queries are going to the same worker. Further testing will show that this isn’t too much of a problem, but is something we can easily prove this by running HAProxy locally.

The numbers we’ve got out of this so far aren’t terribly interesting in themselves, but we have managed to show that an HDFS Namenode issue was directly affecting Impala performance.

Want to read more about Data? Check out these related articles...

14 Jun '17

Berlin Buzzwords 2017

What we learned at Berlin Buzzwords 2017

Author:

Alice Kaerast

Time:

3 minute read

26 Apr '17

How we broke Hadoop by optimising services

We’ve been optimising the allocation of services in our Hadoop cluster recently. It turns out a quiet Hadoop gateway server is a bad one.

Author:

Alice Kaerast

Time:

3 minute read

23 Jan '17

Towards a realtime streaming architecture

Outline of the streaming architecture we are standardising around in the data tribe at Sky Betting & Gaming

Author:

Alice Kaerast

Time:

7 minute read

18 Jan '17

A Recent Graduate's Guide to Sky Betting & Gaming…

A recent graduate’s blog on their first few months at Sky Betting & Gaming as well as what to expect.

Author:

Marcus Ojerinde-Ardalla

Time:

7 minute read

12 Jan '17

Hadoop: The Data Storage Elephant

Outline of how Hadoop works and how it is used at SBG

Author:

Former Employee

Time:

4 minute read

30 Mar '16

Google Phone Numbers in Spark

Our CRM team rely on having clean phone numbers to push SMS messages to customers, various people have tried creating some logic for this validation but surely this is a solved problem.

Authors:

Alice Kaerast and Darrell Taylor

Time:

15 minute read

17 Nov '15

How to DBA - All Your Base conference experience

Some thoughts from this year’s All Your Base conference on the past, present and future of how we manage databases.

Author:

Alice Kaerast

Time:

6 minute read

9 Sep '15

Open-Sourcing Pidl (Pipeline Definition Language)

Announcing the release of Pidl, a Ruby DSL that we developed to manage our ETL pipelines through Hadoop.

Author:

Craig Andrews

Time:

17 minute read

5 Aug '15

Distributed Database Query Optimisation with Lego

Lego based notes from a workshop on Hive, going from basic unpartitioned tables through to partitioned Impala tables with stats computed and backed by parquet.

Author:

Alice Kaerast

Time:

6 minute read

14 Apr '15

Alberto Brandolini on DDD & CQRS

Some key take-aways from having Alberto Brandolini on-site at Skybet talking to us about DDD methodologies.

Author:

Rob Tuley

Time:

2 minute read

20 Jan '15

SerDe vs UDF – parsing JSON in Hive

5 different approaches to handling JSON data with Hive.

Author:

Tom Scott

Time:

15 minute read

Meet the author

Alice Kaerast

Senior Automation Engineer · Infrastructure Tribe

Worked at Sky Betting & Gaming 2012–2019

Alice played a key role in building our Hadoop-based data platform as a DevOps Engineer and then Architect, and then applied this engineering experience in the infrastructure tribe on Apache Kafka and other shared platforms.

Tweets by @SBGTechTeam