Ramping-arrival-rate #VU oscillations problem

Hello,

I’m using the ‘ramping-arrival-rate’ executor on a daily basis, and I encounter very often oscillations in the VU/rate regulation. What I call hiccups is the fact that the sent requests/second will fall, as if k6 was freezing for a short time, then as if it was trying to compensate, there will be a peak, way higher than the current rate goal, causing a response time degradation.

Example:

As the test is running, I can see the current VU usage for my UC falling down from 600 for example, to 100, then going up to 800, then stabilizing again at 600… Those oscillations can be seen in the CLI at runtime… They also seem to be more frequent as the load increases, they appear every two minutes when the load reaches 75 rps, increasing to a period of 1min at 600 rps…

At first, I noticed that it happened when k6 had to initiliaze VUs on the fly, so now I try to always make sure that preAllocatedVUs == maxVUs. k6 initialization takes longer, but the problem happens less frequently.
But still, even when k6 doesn’t have to init VUs on the fly, those hiccups appear and prevent a correct load injection…
I made sure that my k6 instance had enough CPU and memory to run. I tried to increase the CPU, or the memory, but it didn’t change anything.

Am I the only one having this problem? Is there anything I could do to prevent that?

Thanks.
Nicolas.

But still, even when k6 doesn’t have to init VUs on the fly, those hiccups appear and prevent a correct load injection…

Are you sure that the VUs you have initialized are enough to run the iterations/s you have specified? You should see a warning like Insufficient VUs, reached 800 active VUs and cannot initialize more if they are not.

Also, keep in mind that a request is counted only after it finishes, not when it starts - we can only measure how long it takes when it’s done. So, if the system under test (or the network) temporarily slows down, requests will suddenly start taking longer to return, but k6 will still continue making them. When that SUT/network bottleneck is resolved, a lot of requests will return at the same time, leading to exactly the trough followed by a spike you’re observing. This is corroborated by the spikes in response_time (bottom chart) you see exactly at the same time as the drops of req/s.

Also, the ramping-arrival-rate executor shouldn’t compensate or catch up - if it can’t start an iteration, it should emit a dropped_iterations metric and skip it.

Mind you, this might still be a k6 issue, but I can’t tell just by looking at the chart. To debug if the slowdown is caused by k6, you can add a new custom metric. If you add a Counter and you .add(1) to it just before you make your HTTP request, you should be able to tell if the problem is caused by k6 or the network/system-under-test. If the counter chart follows the specified ramping exactly, without any dips or spikes, then the problem is not with k6.

Yes, I initialized about 800 VU, and in fact 600 were enough to get to 600 rps.
I understand your remark, but if the system under load was suddenly slowing down, k6 would compensate by adding more VU, and we would see a sudden increase, but NOT a slight decrease beforehand, as we can see on my charts… and as I can see in realtime on the CLI/VU count of my scenario… no?

Thanks for your suggestion, I will add a counter and relaunch the test.
I will also try to inject with two different k6/influxdb DB against multiple instances of the system under load…

I would also recommend graphing some/all of the other http_request_* metrics and seeing which one of those spikes as well.

Here are all the metrics:



zooming in on an oscillation:

Did you add the custom Counter before the request? Since I’m not seeing anything in these screenshots that contradicts the possibility of the SUT/network being slightly slower and requests piling up a bit. The only alternative I can think of is some internal pause in k6 (or the testing system) for garbage collection or something like that.

@Nicolas Can you run k6 as in
GODEBUG=gctrace=1 k6 run script.js (obviously add your other options)
It will print 1 line each time a GC happens with some information explained here (search for gctrace). You can paste some of it here but also this will show you if the GC is not taking too long as in what I am seeing it looks like upwards of a 100ms which is not normal.

/home $ GODEBUG=gctrace=1 k6 run --system-tags proto,subproto,status,method,url,group,check,error,error_code,tls_version,scenario --out influxdb=http://influxdb:8086/k6_db ./main_yaam.js


      /\      |‾‾|  /‾‾/  /‾/
 /\  /  \     |  |_/  /  / /
/  \/    \    |      |  /  ‾‾\
   /          \   |  |‾\  \ | (_) |
  / __________ \  |__|  \__\ \___/ .io

  execution: local
 script: ./main_yaam.js
 output: influxdb=http://influxdb:8086/k6_db (http://influxdb:8086)

  scenarios: (100.00%) 2 executors, 450 max VUs, 30m30s max duration (incl. graceful stop):
       * UC_revoke: Up to 0.00 iterations/s for 30m0s over 1 stages (maxVUs: 10-50, exec: UC_revoke, gracefulStop: 30s)
       * UC_use_RP: Up to 150.00 iterations/s for 30m0s over 2 stages (maxVUs: 400, exec: UC_use_RP, gracefulStop: 30s)

UC_revoke ✗ [--------------------------------------] 10/10 VUs    00m00.0s/30m0s  0 iters/s
UC_use_RP ✗ [====>---------------------------------] 400/400 VUs  04m13.8s/30m0s  043 iters/s

gc 1 @0.008s 2%: 0.007+0.45+0.20 ms clock, 0.063+0.11/0.30/0.48+1.6 ms cpu, 4->4->1 MB, 5 MB goal, 8 P
gc 2 @0.010s 4%: 0.005+0.54+0.19 ms clock, 0.040+0.12/0.22/0.47+1.5 ms cpu, 4->4->2 MB, 5 MB goal, 8 P
gc 3 @0.018s 3%: 0.012+1.6+0.022 ms clock, 0.096+0/0.67/0.96+0.17 ms cpu, 6->6->4 MB, 7 MB goal, 8 P
gc 4 @0.051s 2%: 0.007+4.3+0.025 ms clock, 0.058+0.19/6.4/2.6+0.20 ms cpu, 8->8->8 MB, 9 MB goal, 8 P
gc 5 @0.106s 3%: 0.007+21+0.026 ms clock, 0.063+1.1/24/10+0.21 ms cpu, 15->17->15 MB, 16 MB goal, 8 P
gc 6 @0.165s 2%: 0.010+55+0.067 ms clock, 0.080+0.25/10/22+0.54 ms cpu, 29->31->20 MB, 30 MB goal, 8 P
gc 7 @0.273s 2%: 0.016+6.9+0.023 ms clock, 0.12+0.31/7.5/21+0.18 ms cpu, 39->40->18 MB, 41 MB goal, 8 P
gc 8 @0.366s 2%: 0.010+5.1+0.023 ms clock, 0.086+0.084/8.9/18+0.18 ms cpu, 36->36->20 MB, 37 MB goal, 8 P
gc 9 @0.507s 2%: 0.011+18+0.028 ms clock, 0.094+0/32/21+0.22 ms cpu, 40->40->22 MB, 41 MB goal, 8 P
gc 10 @0.722s 1%: 0.010+6.8+0.025 ms clock, 0.081+0.21/8.6/18+0.20 ms cpu, 43->43->19 MB, 44 MB goal, 8 P
gc 11 @0.894s 2%: 0.008+28+0.028 ms clock, 0.065+0.13/55/135+0.22 ms cpu, 38->39->26 MB, 39 MB goal, 8 P
gc 12 @1.115s 2%: 0.009+12+0.038 ms clock, 0.073+0.19/23/53+0.30 ms cpu, 50->51->28 MB, 52 MB goal, 8 P
gc 13 @1.381s 2%: 0.009+40+0.026 ms clock, 0.073+0.18/79/183+0.21 ms cpu, 56->56->29 MB, 57 MB goal, 8 P
gc 14 @1.697s 2%: 0.008+26+0.007 ms clock, 0.068+0.25/51/129+0.061 ms cpu, 57->57->29 MB, 58 MB goal, 8 P
gc 15 @1.932s 2%: 0.007+5.8+0.028 ms clock, 0.062+2.6/11/22+0.23 ms cpu, 58->59->22 MB, 59 MB goal, 8 P
gc 16 @2.023s 3%: 0.63+97+0.006 ms clock, 5.0+98/192/0+0.049 ms cpu, 44->47->32 MB, 45 MB goal, 8 P
gc 17 @2.224s 5%: 0.94+195+0.028 ms clock, 7.5+290/213/0+0.23 ms cpu, 62->69->49 MB, 65 MB goal, 8 P
gc 18 @2.629s 7%: 0.088+391+0.053 ms clock, 0.71+116/606/0+0.42 ms cpu, 88->103->76 MB, 99 MB goal, 8 P
gc 19 @3.236s 12%: 83+509+0.029 ms clock, 670+474/930/0+0.23 ms cpu, 134->160->118 MB, 153 MB goal, 8 P
gc 20 @4.335s 14%: 0.19+987+0.031 ms clock, 1.5+456/1886/0+0.25 ms cpu, 197->246->184 MB, 236 MB goal, 8 P
gc 21 @6.029s 15%: 0.77+1498+0.037 ms clock, 6.1+191/2994/0+0.30 ms cpu, 300->382->288 MB, 369 MB goal, 8 P
gc 22 @8.731s 17%: 3.8+2486+0.043 ms clock, 30+817/4971/0+0.34 ms cpu, 466->593->444 MB, 577 MB goal, 8 P
gc 23 @12.822s 17%: 0.11+3998+0.043 ms clock, 0.91+495/7907/0.048+0.34 ms cpu, 711->913->688 MB, 889 MB goal, 8 P
gc 24 @19.625s 16%: 2.2+3796+0.005 ms clock, 17+99/7591/4841+0.046 ms cpu, 1101->1265->916 MB, 1376 MB goal, 8 P
GC forced
gc 25 @143.424s 2%: 0.10+1599+0.067 ms clock, 0.82+1.5/3104/9075+0.54 ms cpu, 1110->1116->828 MB, 1833 MB goal, 8 P
GC forced
gc 26 @265.035s 1%: 0.11+1590+0.005 ms clock, 0.88+0.84/3181/8818+0.042 ms cpu, 1370->1376->830 MB, 1657 MB goal, 8 P
^C
running (04m14.2s), 000/410 VUs, 5491 complete and 86 interrupted iterations

Oscillations occur exactly on GC #25, #26, and so on…

1 Like

I was running k6 inside a 1Core pod on OpenShift, and increasing the CPU ressources to 2Core seems to lessen the impact of recurrent GC…
The percentage of time spent in GC since program start fell down from 3% to 0% for each GC, and even if it still causes oscillations, it lessens them, it seems to do the trick. :slight_smile:

Would more frequent GC also help in my case?
Is there any easy way to try this?

BTW, I’m quite eager to see distributed execution added to k6…
Being able to horizontaly scale k6 pods would be a wonderful feature!

Thanks for your help! I really love using k6 on a daily basis! :slight_smile:

1 Like

Golang’s GC(k6 is written in golang) is made to work in parallel and as a whole to have not stop the execution of the program (like other GCs), but in order for that to work, you need more than 1 core ;).

So this should be enough, you can set GOGC to something less than 100 to make it GC more frequently.

I am actually more impressed with the fact that with 600 rps and some output k6 didn’t need to GC for enough time that the GC was forced …

What output are you using?

Also, you can look at this Guide on how to run big tests without the need for horizontal scaling

To set GOGC, I have to recompile k6, right? I can’t set it at runtime like for the gctrace parameter?

My script is simulating basic REST calls through GET & POST HTTP requests. Metrics are sent to my local influxdb. Each VU is loading at the init stage a 1000 line text file containing a list of valid cookies they use randomly.

As we are using k6 on an OpenShift platform, we would like to keep it running as pod(s) on this platform, and not use any external VM/hardware.
And on OpenShift, we want to keep the pods as small as possible, as for the CPU sizing, hence the need for horizontal scaling. I don’t want to use a whole node to deploy something like an 8 core k6 pod… :s

Ok, I can set GOGC at runtime, cool… :slight_smile:
I will to try fine-tune GOGC to try to get more frequent GC?

You can just set GOGC=50 k6 run ... and that means that when the memory used grows by 50% from the previous GC a new GC will be started … overly simplified. I am not really certain it’s worth playing with it if you don’t have any problems, although it can’t hurt so :slight_smile:

Given how golang was designed I would argue that it is probably a very bad idea to run k6 on a single core machine as … you’ve found out first-hand :).

Also … no matter when the distribution execution is coming (and … let’s say I don’t expect it to be within the next 12 months) it will definetely not be less computatinally expensive than just having 1 big k6 instance :wink: . Yes, we are likely (and somewhat required) to optimize some parts of it in order to fix somethings in the way of distirubution execution, but it will still mean that a single instance will be much more performant … and easy to setup :).

You can try to roll your own distributed system using the “fairly” new execution segments, but you will need to coordinate the instances and loose any kind of thresholds as … well there are multiple instances, as well as probably other things that I currently don’t remember. Also again … it is much more likely that you will get better perfomance from bigger instances than from smaller ones with the same combined size, at least up to some … limit.

Also the performance optimizations in the guide I linked work on both small and big instances.

Have you had problems with much bigger instances? Are they not priced … linearly? Sorry I haven’t used OpenShift since I tried it out 5-6 years ago or something like that.

I am just trying to figure out … why you want a horizontal scale when in reality vertical one will probably be enough for you(given your comments) and will be much more easier to do.

Testing different GOGC values didn’t help in the end… :s

We are running our own OpenShift platform, and our resource usage strategy is to favor small burstable pods deployed with nodes anti-affinity. The idea is to leverage upon the horizontal scaling to fill in the nodes only when required, based on the incoming load. Our nodes have 8 cores and using multi-core
pods comes with the risk of not being able to deploy them, depending on the current node filling status.

But everything is possible, we could also use a dedicated node, a dedicated VM outside of OpenShift, or a dedicated hardware also, but deploying k6 as small burstable pods was comfortable for us, and this recurrent GC-caused pausing times/oscillations is not problematic with small loads.

In fact we even packaged k6 to be deployable ‘as-a-service’ directly from our OpenShift Marketplace, with its influxDB, Grafana, custom provisioned dashboards and a solution to synch the local injection scripts from our Gitlab repos… :slight_smile:

So higher loads will mean different solutions and different sizing/hosting for k6 then… we can live with that. :slight_smile:

I also tried to transpile my ES6 script using webpack, as proposed in the Guide, but only get an obscure error:

$ webpack ./main.js
Hash: 6acc90d677e721b2d7dc
Version: webpack 4.43.0
Time: 1277ms
Built at: 10/23/2020 2:12:53 PM
 2 assets
Entrypoint main = main.es5.js main.es5.js.map
[0] external "k6" 42 bytes {0} [built]
[1] external "k6/http" 42 bytes {0} [built]
[2] ./main.js 8.36 KiB {0} [built]

ERROR in main.es5.js from UglifyJs
Invalid assignment [main.es5.js:179,28]
error Command failed with exit code 2.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

And I can’t find main.es5.js to try to understand what could be problematic at line 179…

It seems like the Guide doesn’t explain this but you should try using https://github.com/k6io/template-es6 to make the transition … .depending on how big your script is … it is not very hard to rewrite it by hand :wink:

Edit: Which also seems to have been reused in the linked by guid repo

Thanks! I rewrote it by hand, and the improvement using –compatibility-mode=base is really amazing.
k6 Memory usage has been divided by 6, and there’s no more noticable oscillation during GC, that’s perfect! :slight_smile:
Any chance that we could achieve this kind of performance using ES6 someday?