Extend OctoPerf Results With Instana

Extend OctoPerf Results With Instana

Today we have a look at the added value you get by using a combination of load testing and APM. Our tool of choice at OctoPerf is Instana, because we share a lot of common values. To put it short we both have a huge focus on ease of use and docker oriented platforms. I think it makes this collaboration even more relevant for our users.

Anyway, as you probably know OctoPerf is oriented toward running realistic tests as easily as possible. And Instana will get you live insight about your entire platform allowing you to instantly understand the consequences of your load test. This blog post is a collaboration with folks at Instana and you can find the second part whith a detailed analysis of the test on their blog.

Test script

We will be conducting a small test on the robot shop platform provided by Instana:

Robot shop

I recorded a quick test script adding two products to the cart, I named the transactions this way:

Virtual user

Runtime configuration

Load policy

And regarding the runtime, we will be launching 200 users in total but split between 2 cloud locations:

Test description

EU West is an Amazon Web Services zone in Paris and EU is the Amsterdam zone from Digital Ocean. Since the robot shop is hosted on Amazon as well but not in Paris we expect to see different results from both zones. In particular coming from outside Amazon’s network should take longer.

Instana integration

Before launching the test, we activated the Instana header in OctoPerf:

Instana header

That way we’ll send additional information on the test we run. The following configuration has been done on Instana to catch it:

Instana configuration

Test results

Overview

Overview

If we focus on the hit rate in light green we clearly see something’s wrong just before 10:20. At the same time, the error rate is increasing along with response times.

APDEX

The APDEX graph shows the same issue:

APDEX

It drops down at the same time than the hit rate, indicating a decrease in quality of service all the way down to 0 for a short time.

Results prior to failure

Using a time range filter I first focused on the period before the issue:

Before failure

It is interesting to see that even before we have a critical failure, response times are already starting to increase. The hit rate mostly follows the number of users running meaning the test is still ok so far. But as we know this will not be the case for long.

Error details

Looking at the details of one error we can see the application is not responding anymore (timeout):

Error detail

That certainly explains why the response times are getting higher at that time.

Result tree

A look at the result tree gives us valuable information:

Result tree

First we can see that we only get errors from AWS in Paris whereas the application is not available anymore after 10:20. What’s going on is that users coming from Digital Ocean in amsterdam are having a longer timeout. In fact the test had time to finish before they could get any answer. We clearly see that the network path taken is not the same since the behavior is very different. It only stresses out that coming from outside the application network, like a real user would do is critical to realistic tests.

AWS versus DO

On this table, we have AWS on the left and DO on the right:

AWS VS DO

We clearly see that the users coming from outside AWS have a longer response time.

Response time breakdown

Looking at the response time breakdown we see mostly server time:

Response time breakdown

Since latency and response time increase together we can tell that the server is overloaded.

Throughput

A quick look at the throughput is alway interesting:

Throughput

Since it’s mostly images (png and jpg), we can tell that the text content is optimized. But looking at the list of bandwidth hungry ressources, we see one in particular:

Throughput per request

The image graph.png is using 1.3 GB of bandwidth out of the 1.7 GB used for this test. It’s probably worth investigating this further.

Instana analysis

While the test is running in OctoPerf, we quickly get a warning on Instana: SYSTEM LOAD TOO HIGH. We then get another message: SYSTEM MEMORY EXHAUSTED telling us the server is out of memory.

Instana overview

Looking into the details we can also see the performance dropping quickly for each layer:

Instana details

Then the database stops answering around 10:20 which explains the timeouts we get later. But this is just a quick overview of all you can get out of Instana. Again, you can find a more detailed analysis in the second part of this blog on their website.

Conclusion

A load test, as realistic as it might be, is only worth as much as you can get out of its analysis. Since Instana makes this process even easier, it is a perfect match for a load testing tool like OctoPerf. Plus it’s been a real pleasure working with folks at Instana on this blog post, so you can expect more collaboration in the future.

By - Support and performance eng. Director.
Tags: Integrations Analysis Report Instana Methodology

Comments

 

Thank you

Your comment has been submitted and will be published once it has been approved.

OK

OOPS!

Your post has failed. Please return to the page and try again. Thank You!

OK