Somethings Majorly Wrong with Stats - Thoughts / Advice

To give an overview, we use JMeter, and I stumbled upon K6. After spending a few days with it, i felt it could be a good replacement for JMeter. So to test the two out, I generated a simple request

  • Post Request to generate data
  • Check that there is a 201 returned (JMeter and K6)
  • Check that If and FirstName is present in JSON responseBody (K6 only)

Run Time:

20 Threads for 2 mins (JMeter)
20 VUs for 2 mins (k6)

The results are not just slightly different in k6 they are nearly 3x different

JMeter/K6

Can anyone explain how this can be please? I need to have trust in my stats, and this isn’t giving me any confidence.

Hi! I totally understand the need to be able to trust results before committing to switch tools, and a 3x difference in response times is definitely worth looking into. Here are a few reasons why this might have occured, and what you can do to rule each one out:

Resource utilization

k6 is written in Go, and JMeter is written in Java. Besides the difference in language, this also means that they fundamentally differ in how they handle virtual users. We have found, in our testing, that k6 is significantly more performant than JMeter in that it requires fewer resources to achieve the same level of load. Here is a blog post on this topic, along with links to the script so that you can confirm the results for yourself.

Higher resource utilization can lead to inaccurate load testing results as it can make the load generator the performance bottleneck, instead of the application under test.

Here are some ways to rule this out as the cause of the discrepancy:

  • Monitor your load generator’s resource utilization while you run both tests.
  • Verify the JVM settings. JMeter requires you to tune the JVM it’s running on as well as that of the load generator.
  • Run the JMeter test in CLI mode. JMeter’s GUI is an extra overhead that is not present with k6, so it’s best to run JMeter headlessly during the test.

Throughput

20 threads in JMeter != 20 VUs in k6. You can see this in your screenshot. Within a similar duration, your JMeter test sent 8,317 requests and your k6 test sent 25,794 requests. Even though the number of “users” is the same, the load each test generated AND the resource each test required were clearly not the same either.

Test duration

The short duration of both tests (~2 mins) may be a bigger contributor than is apparent. Very short tests increases the likelihood of outliers skewing the results heavily. For example, in the JMeter test, the 99th percentile response time (over 1s) is significantly higher than the 95th percentile response time (478ms). Did the k6 results show a similar distribution?

To rule this out, run both tests over a longer period of time.

Scripting differences

What’s in the script can also affect how long it takes to execute, such as:

  • think time: Did you use any timers (JMeter) or sleep (k6)? If yes, were they the same type (Gaussian, uniform random, constant)? JMeter applies timers to every sample within the scope of the timer.
    If you’re not using think time, I’d suggest you try adding think time to both. Sending requests repeatedly, without think time, could do more harm than good with regards to load testing results since it’s very resource-intensive.
  • Embedded resources: Did you record embedded resources in one script but not the other?
  • JMeter log configuration: Verify your JMeter configuration to see what’s being logged. You can click on the Configure button of the listener you’re using to verify this. JMeter’s default log settings record more than k6’s log results settings do.

Give those suggestions a try and let us know how it goes! It’s very reasonable to question these differences, but it may take some testing to figure out. Good job for testing your test tools-- I wish more people did that!

2 Likes

Thanks for the above, this is a great start for me to figure out why and what is going on :slight_smile: :slight_smile:

Regarding:

Is there a way to figure out if JMeter uses X Threads, then K6 needs to use X VUs if they are not 1:1

Re:

No to try and reduce as many variables as possible, i made sure that things like Sleeps etx are not present, as that would just confuse things more.

re

100% agree, i did a check on the database via Count, and can confirm that all 25K to the exact number where created after the endpoint was hit.

@nicole

Also on a side not, great to meet you, the Youtube videos you have done are what lead me to use the tool :slight_smile:

2 Likes

Is there a way to figure out if JMeter uses X Threads, then K6 needs to use X VUs if they are not 1:1

Unfortunately not! That conversion rate is affected by so many factors that there’s no direct conversion that will hold true for everyone. You may need to play around, run a few tests, and then re-do your benchmarks/baselines with k6 (should you decide to explore it further).

However, virtual users are actually a pretty ambiguous way to measure test throughput anyway. 10 users streaming a video is going to have a drastically different effect on the application compared to 10 users fetching some short snippet of JSON. So one thing you could do to improve consistency would be to consider fixing rps or number of iterations instead. This works kind of like the throughput timer in JMeter. In k6, you can do this by using executors– specifically the constant and ramping arrival rates.

Also on a side not, great to meet you, the Youtube videos you have done are what lead me to use the tool

Wow, that’s awesome! It’s cool to hear a voice from the internet void. :slight_smile: I’m glad I could help!