K6 cannot execute js scripts if the csv file is too large

sunnini · June 28, 2022, 10:33am

When traversing the CSV file, k6 cannot execute js scripts if the csv file is too large. How to resolve it?

mstoykov · July 5, 2022, 7:26am

Hi @sunnini, sorry for the slow response, it appears this fell through the cracks last week.

At 350mb this csv file is really large. I have seen people using 30-50mb and even that is complete overkill as given how much entries that is you are likely to need days of not months or years to go through that many users.

In this case we have to load it in memory - which will be fast enough, but then a JS library needs to parse that 350mb of a string in multiple entries. I would expect this will take a very long time for a 350mb string and that amout of entries.

Arguably going with a json that won’t need to go through paparse will be better, but I would just argue that you should significantly reduce the number of data you are trying to use. As mentioned above this is likely a complete overkill on what you will actually be able to use.

Hope this helps

sunnini · July 5, 2022, 7:41am

This CSV file actually contains 10 million user entries. 10 million was prepared for load test which calculated according to the formula: QPS(10000) * Duration(900s). So I think this figure is reasonable

sunnini · July 5, 2022, 7:46am

Arguably going with a json that won’t need to go through paparse will be better, can u provide a example that show how to use it.

mstoykov · July 5, 2022, 7:58am

So I think this figure is reasonable

The figure might be reasonable if

you will actually hit this, and I would argue that if you haven’t load test so far you will find out your system under test (SUT) can’t handle anything close to 10k QPS
even if it could it likely doesn’t need all of this to be with different user entries.

We always highly recommend start somewhere(small usually) and go up from there.

can u provide a example that show how to use it.

here

sunnini · July 5, 2022, 8:11am

If i use JSON file, whether this problem can be avoided when the file is too large? The example shows JSON file also need to be loaded and parsed when k6 run js code.

mstoykov · July 5, 2022, 9:08am

From my local experiments - it does finish. It does take around 1m and it needs 14.5GB to do it but it managed. Take this numbers with a grain (or three) of salf as I was using simple username and passwords. I also removed the outer object and just have the inner array as the json, whihc likely saved some processing.

When I tried with just having inner arrays as in

[
["test", "qwerty"],
["test", "qwerty"],
["test", "qwerty"],
["test", "qwerty"],
["test", "qwerty"],
["test", "qwerty"],
["test", "qwerty"],
["test", "qwerty"],
["test", "qwerty"],
....
["test", "qwerty"]
]

It got down to 13GB .

All of this memory seems to mostly be reusable after that as it’s mostly only used when parsing the whole JSON to javascript objects, but after that it should take a lot less memory to store it.

Again I would recommend just starting with a lot smaller data sample and then increasing.

I would also recommend not using any data file if

you can just use random data - just generate it in the script.
if you can have the username to be username1 to userrname1m and the password password1 to password1m - you don’t need to have that in a file

The above can be a bit more complicated but the point is that you don’t need the data in a file that you load if you can generate it in the script. And it will be a lot faster and it will be a lot mor

PlayStay · July 7, 2022, 4:31pm

hi @sunnini I agree with @mstoykov that a 350MB datafile is enormous but I can see the need for that much unique data if you do not want your system under load to cache your synthetic transaction for the duration of the test.

I kinda have the same problem and here is an example of what is working for me so far. I switched from csv to json datafile using the sharedArray parser.

add column headers to your csv datafile
install csv to json convertor ( I use the “dasel” CLI ) - “dasel -r csv -w json < data.csv > data.json”
follow these instructions - Data Parameterization for sharedArray use

while sharedArray might allow your script to ingest that large a datafile you may find that in practice your load generator will croak/crash due to open file handles or out of memory errors on the host. This might also cause issues depending on the type of Executors you use or load profile your create - VUS, iterations, duration etc… There are a number of community posts where folks have split their datafiles. Also MStoykov wrote an article showing the sweet spot for sharedArray use - SharedArray

good luck.

mstoykov · July 11, 2022, 8:06am

You can also decide to not actually have a file but put all of them in a different services, for example redis and use xk6-redis to retrieve them when needed.

You can also build your own extension that loads the file more effectively.

Although for 10k QPS I would probably provision more than 14GB so it should be fine .

Although again - I would try to go with less data to start and then go up with the amount of data to see how the system handles it. So if you have still not tried with less data - I seriously recommend going with that option first, it is by far the cheapest one (in all regards).

sunnini · July 11, 2022, 9:19am

I tried load a JSON file in the script, K6 can execute script and generated aggregate report, but why the progress bar displayed failure?

mstoykov · July 11, 2022, 9:28am

@sunnini maybe you are hitting k6 shows default with red cross during simple test instead of green checkmark · Issue #2500 · grafana/k6 · GitHub which was not of any real consequence, except for confusing users. You can update to the latest version and see if it goes away.

mstoykov · July 11, 2022, 9:30am

Although in your particular case it seems like you ar ejust nowhere near the expected amount of requests that you will be making.

You make around 2.6RPS and you want 10k so I guess in this case this is wahat the red cross will be for.

I would also recommend using arrival-rate executor if you want a specific request rate as that makes it more obvious that something went wrong.

sunnini · July 11, 2022, 9:40am

I installed the latest version.If the script does not load file to parameterized, the progress bar displayed normally（green checkmark）

sunnini · July 11, 2022, 10:01am

The finall checks were ok, howerver the progress bar just started after the script has been executed for a period of time. So it showed the red cross ✗ finally.

mstoykov · July 11, 2022, 10:05am

The finall checks were ok,

This isn’t about checks it’s about the fact that you told k6 to execute 5m iterations over 5 minutes and it execute 793k in those 5minutes and then the time ran out and it needed to stop.

howerver the progress bar just started after the script has been executed for a period of time.

I would expect this is because it first needed to load and parse the json - for me that took nearly a minute for the 10m record file.

sunnini · July 11, 2022, 10:31am

I found that the progress bar updated very slow during the script execution.

mstoykov · July 11, 2022, 12:22pm

@sunnini as in the ===> moving slowly? Or something else?

This likely is because we have ~30 characters worth of progress and 500k iterations and in shared iteratins it is tracking the iterations that have been completed out of all of them - so it will only move forward if 1/30 of the iterations are done, this means that over 16k iterations are needed for one character.

Given what you are currently seeing I would argue you need more than 100 VUs - at least 400, probably more.

sunnini · July 12, 2022, 2:24am

I set VU to 400，but 400 VU and 100 VU seem to have the same effect on progress bar movement, ===> moving very slowly

sunnini · July 12, 2022, 4:07am

VU set to 1000, the result was same

mstoykov · July 12, 2022, 7:26am

, the result was same

No it isn’t - the http_re_duration went from median of 32.54ms at 100VUs to 132.96ms at 400VUs and to 330.69ms at 1000VUS.

Given that you don’t seem to be bandwitdh limitted (doing under 2MB/s) I would expect that either the local machine you are running from doesn’t have enough resources or the limit of your system under test is arund 2660 QPS.

There are definitely other explanations but those are the simplest.

So I would now go look at both systems and see how they behave while the load is happening. Are they running out of CPU? Is a DB that is used not returning fast enough? Is it just that the system doing the test running out of CPU and can’t do more?

Hope this helps you and good luck