K6 stops generating load for a short time during long test

Hello to k6 community and devs!
I’ll try to be quick about my issue.
I need to run a long test (around 8 hours), so I’ve prepared my scenario which worked perfectly fine on a shorter duration and launched it.

What I’ve witnessed after some time:


I thought that might be the issue with me sending metrics to InfluxDB, and yes, I got a warning which looked like this:
The flush operation took higher than the expected set push interval. If you see this message multiple times then the setup or configuration need to be adjusted to achieve a sustainable rate. output=InfluxDBv1 t=4.53481258s

After tweaking and increasing PUSH_INTERVAL in k6 ENV’s problem didn’t disappear, unfortunately and I still see those warnings, matters not if there’s a 1 second interval or 3 seconds. All the same.

So I have two questions:

  1. About that screenshot, I’ve checked system metrics as well (to see if I am right about my thoughts, that it’s just a metric thing). Exactly at moments of those RpS falls, system received less load (less CPU utilization on system side). It happened quite frequently as you can see. What would possibly stop load generation from k6 for a second-two?
    To clarify, CPU consumption on a generator machine was around 20-25% (where k6 docker container resides) and I’m not using same Influxdb for system metrics, just for k6. System metrics are provided by Prometheus, to exclude metric issue.

  2. What can I possibly do with that annoying warning? How can I fix the issue?

Thank you a lot in advance!

Hi @Nesodus,
I need more details for understanding your real problem and providing a solution.

Can you share the following points, please?

  • Which executor are you using and ideally more about your test code.
  • Is the graph shared showing k6’s RPS collected with InfluxDB or is the RPS got from your server?
  • Do you see any dropped iteration?
  • Are you setting other options for InfluxDB output?
  • Are running k6, InfluxDB and the service under test on dedicated machines? If not, what is the distribution?

This is expected. Increasing K6_INFLUXDB_PUSH_INTERVAL will only help if the problem would be the raw number of HTTP requests received by InfluxDB but this isn’t the real problem most of the time. What makes pressure on InfluxDB is the number/complexity of metrics to handle for each request so increasing the push interval generates HTTP requests that contain more metrics so probably more load for InfluxDB.

You can check The flush operation took higher than the expected set push interval - #3 by codebien for the available options for improving the situation with InfluxDBv1 performance.

Hello @codebien !
Providing information you were asking for:

  1. ramping-arrival-rate (because I have multiple steps with constant load (plateau) and multiple lines to get to those plateaus)
  2. Collected with InfluxDB
  3. I do see few, yes, unfortunately lost my stdout of that test
  4. Only these 2: K6_INFLUXDB_CONCURRENT_WRITES=6 and K6_INFLUXDB_PUSH_INTERVAL=3s
  5. Yes, k6 got it’s own machine (pretty big one, with 36 phys cores with hyperthreading, means 72) and launched inside a docker container, InfluxDB is dockerized as as well, but I moved it to a different host (even bigger than k6’s) and volumed infxludb/ so my metrics are stored on a machine drive (currently NVME)
    They are in the same network and there are no timeouts/long (no more than 3-5 ms) operations between k6 container and influxDB container. Network-wise. Under load it there might be and I guess there are different timings.

Hope that helps!

Hi @Nesodus,
sorry for the late response.

Do you have the metric in InfluxDB? Can you also share it so we can compare it with the previous graph?

Considering the machine size, you should try to increase the concurrency. You can try up to 36 and find the best balance for your InfluxDB setup. The push interval around 3s is too much, you should decrease it (the 1s default value should be fine).

Hello there o/
Appreciate your tip to increase number of concurrent writes, it helped a ton and now everything is working just fine.
Lowered push interval as well to 1 second.
Thank you!

1 Like