K6 Steady VU Load

Priya · July 31, 2020, 4:59am

Hello,

we ran a k6 test against our API’s from two machines with steady VU load at the same time.
Machine 1: 5m:5000
Machine 2: 5m:5000

And we plotted the RPS across the two machines. We expect the RPS to be approximately steady. We are expecting to see approximate constant load. Instead as you can see here it keeps going up and down in intervals. We ran it for straight 5m.

So when K6 runs for 5m with 5000 users straight, the load is constant no? Does K6 release some VU’s etc periodically?
As you can see in these pics, the RPS is ranging from 9879 to 3295. Just to be clear we are not asking for steady RPS - I have read about that article already - but we expect the VU load to be fairly steady? say 9879 to 9000 and steady would be ok. but as you can see it keeps dipping down and up.

Screen Shot 2020-07-30 at 9.51.28 PM

nedyalko · July 31, 2020, 9:14am

When you have looping VUs, each new iteration will have to wait for the previous iteration in that VU to finish. So, if requests to the target system start taking longer, your whole test will start making fewer requests, since you’re waiting for the previous requests to finish. This is especially true if you’re not using http.batch() to make concurrent requests in the VU, and are instead using http.get() / http.post() / etc., and if you have long timeout values (which k6 has by default, 60s is probably a bit excessive).

This is called a closed system model, since the system under test can influence the requests per second it receives when it slows down. And it’s the reason why in the recently released k6 v0.27 we introduced arrival-rate based executors. They allow you to specify the load in terms of how many new iterations you want started each second/minute/whatever. This way you’d generate a consistent load even if the target system starts responding more slowly. You can find more details in the following links:

Scenarios, especially the arrival-rate section: Scenarios
Release v0.27.0 · grafana/k6 · GitHub
Arrival-rate based VU executor · Issue #550 · grafana/k6 · GitHub (original issue describing the problem)

Just keep in mind that, if you want want to keep the same load you get out of 5000 looping VUs with the server responding quickly (~10k RPS if I’m reading your screenshot correctly), you’d need to configure (potentially, significantly) more than 5000 preAllocatedVUs in the ramping-arrival-rate executor. This is needed because, when the server starts responding more slowly, previous iterations wouldn’t have finished (and freed up their VUs) before k6 wants to start the next iteration, so it’d have no VU to start it on.

It’s a process of trial and error, since iteration duration can vary greatly between scripts or even between test runs, but leave a good margin of VUs to handle slowdowns in the target system if you want a good load test. k6 will emit a dropped_iterations metric every time it tried to run an iteration, but there were not free VUs to run it on, and, in general, you want that to either be 0 or very close to 0.

Priya · July 31, 2020, 8:26pm

Thanks for this. We’ll try this out. Do you have CLI options for specifying executors and all of those scenarios described above? Or can we pass this metadata as a JSON? For eg stages can be expressed like this: 1m:5000,2m:10000 etc in the CLI. Is there anything similar to this for scenarios and new executors?

nedyalko · July 31, 2020, 9:09pm

Unfortunately not. Because of the configuration complexity of the new executors, or even the new advanced options like gracefulRampDown and gracefulStop for old executor types, and the possibility to launch multiple scenarios simultaneously, there wasn’t a generic way to add new CLI flags or environment variables to cover all of them. Or we couldn’t think of one that wasn’t overly complicated, anyway. We’ve maintained backwards compatibility with most of the ways you could specify execution options in the old k6 versions via CLI flags or env vars, but we haven’t added any new ones for the new executors or scenarios.

That said, you could easily roll your own, so to speak. By using k6 environment variables, you can specify any data you want via the CLI --env flag (or an actual environment variable) and access it in the script. The important part is that these __ENV variables are available when the k6 script’s init context is evaluated for the first time, so they can be used in the exported script options, and thus, can be used to configure scenarios…

The simplest way this can be used is to pre-configure various scenarios in your script and then just enable them with k6 run --env SCENARIOS=scenario_name_1,scenario_name_2 script.js, as I demonstrated here: Execution of new scenarios from CLI - #2 by nedyalko

Though there’s nothing stopping you from cramming a whole JSON object in one such variable, and then JSON.parse()-ing it and exporting whatever it contains as options.scenarios. Or, with a few lines of code, having a simple syntax like k6 run --env myRampingArrivalRate=1m:5000,2m:10000 script.js and then parsing this value and constructing the stages of a ramping-arrival-rate executor from it.

brandonh · August 5, 2020, 8:52pm

Here’s the behavior we saw:

Closed model, stage = 5m:5000,5m:5000 executing against the same backend from two identically-provisioned VMs
k6 ramps up and sustains 10k rps to our backend

image2730×1623 381 KB

(continued below)

brandonh · August 5, 2020, 8:55pm

(cont.)

Open model, ramping-arrival-rate executor w/ same stages executing from the same VMs to the same backend. pre-allocated VUs: 6000, max VUs: 10,000
k6 ramps up and only sustains ~7500 rps with over 700k fewer requests sent during the duration and much less constant load.

image2857×1625 422 KB

So it seems, for a constant load on a system for a sustained amount of time, the “old” “closed model” achieves the desired load on the backend while the “new” “open model” simply doesn’t.

mstoykov · August 6, 2020, 1:40pm

The more “stable” iteration, in this case, is because k6 will try to not wait for the SUT(system under test) for each iteration and instead will start new ones … whether that is possible, or if that makes sense for a given case is a different question.
This is a screenshot from k6 cloud where this script was run.

I ran it all in one script in order for better visualizations… where all the scales are the same. Also, I have an error in the script the last stage shouldn’t start scaling down instantly … but it doesn’t matter.

You can see here that the ramping-vus do have a lot of variation due to the fact that the SUT starts to slow down on certain calls and so it makes k6 wait for the response before it ends the iteration and then start a new one.
Moving to a ramping-arrival-rate you can see that with 100 and 200 VUs it still doesn’t manage to continue to make more iterations and request and k6 runs out of available VUs in order to make new iterations but at 300 VUs it manages to go through it with … minimal variation in the request rate and no dropped iterations.
The last scenario … is … well not very interesting given it isn’t what I wanted but it still shows how ramping-vus does stop making more requests as the SUT slows down.

Important things to note:

this test hits more than 1 URL and all endpoints have different expected time to complete. For example, POSTs will take longer than GETs ;).
The test site is being overloaded, this graph doesn’t show it very well but a lot of the calls fail even in the ramping-arrival-rate case … actually MORE requests fail in that case as we do more requests.
You can see that the first ramping-vus skyrockets to way more request/s but this is only for a short period before it completely stops due to the SUT getting overloaded.
There is obviously a need for more VUs as we try to finish more iterations

Without more data, I can’t really tell you what is going on and why the ramping-arrival-rate works worse for you then the ramping-vus, but here are some questions:

are you sure iterations actually finish? You could have an exception in the middle which shortcircuits the iteration - at which point with ramping-vus you start a new one possibly doing some more requests at the beginning of the script before an exception again. With ramping-arrival-rate the next iteration will be started when it’s time is … instead of right away …
If there is sleep at the end of script for the arrival-rate executor, you should remove it
are there dropped_iterations emitted? can you share both of the scenarios options? Maybe there is something misconfigured or not enough VUs.
On top of the previous one, again arrival-rate is about iterations not requests - it starts new iterations to match the configured rate. So if you have a lot of not requests making code in the iteration, or requests with different duration, it is expected that the iteration rate won’t be equal to the rps. It never will be as there is some overhead either way.