When parameterizing data, how do I not use the same data more than once in a test?

Depending on the system under test(SUT), it’s often a requirement to not use the same data more than once in a test or at least ensure that Virtual Users(VUs) are not concurrently using the same data, such as login credentials.

k6 now has a built in property which always returns a unique number. Allowing you to always retrieve a unique row from your data source. Answer included in a reply below.

1 Like

This answer is deprecated. Saved for posterity.

As mentioned above, in order to prevent collisions between VUs accessing data from an external source, we need to calculate a unique number per VU iteration. Luckily, we have a few pieces of data we can use to calculate this. k6 provides the ID of VUs per load generator(k6 instance) as well as an iteration for each VU.

__VU is the ID of a VU. It is 1 based and assigned sequentially as VUs ramp up. Every VU on a load generator will have a unique VU ID.
__ITER is 0 based and increases sequentially as the default function is completed by VUs. Every VU will have their own iteration count.

Consider this example, with my script uniqueNum.js:

import http from "k6/http";
import { sleep } from "k6";

export default function() {
  http.get("http://test.loadimpact.com");
  console.log(`VU: ${__VU}  -  ITER: ${__ITER}`);
  let uniqueNumber = __VU * 100 + __ITER -100 ;
  console.log(uniqueNumber);
  sleep(1);
};

If I run the above script k6 run uniqueNum.js -i 10 -u 10 I will see in the console that each VU will start logging unique numbers in my console window, separated be 100. Resulting in no collisions. I can use this number to select a position from an external file.

// Separate file where contents of data.json follows this pattern:
{
  "users": [
    { username: "test", password: "qwerty" },
    { username: "test", password: "qwerty" }
  ]
}

Then, if our script looked something like:

const data = JSON.parse(open("./data.json"));

export default function() {
    let uniqueNumber = __VU * 100 + __ITER;
    let user = data.users[uniqueNumber];
    console.log(data.users[uniqueNumber].username);
}

We are now accessing a unique position per iteration and per virtual user.

Some things to keep in mind regarding the above method:

  • In our calculation of uniqueNumber, 100 would be the maximum iterations VUs can make before collisions. My selection of it was arbitrary here, you can decrease it or increase it based on need. You may know your test will never exceed 20 iterations per VU

  • The higher the above number, the larger your source file must be. A test with 200 VUs would need 20k lines/rows of unique data if all iterations were completed

  • In the Load Impact Cloud, a maximum of 200 VUs are assigned per load generator. So you would need to take LI_INSTANCE_ID into account when calculating unique values. More info on Load Impact env variables here

2 Likes

This answer is deprecated. Saved for posterity.

I’ve been assisting some users with using unique data across multiple k6 instances in the LoadImpact cloud. Since each instance will have overlapping __VU IDs. This adds some complexity to determining our unique number, so I will share some of the required thinking to solve this issue. As mentioned in my last point above we can use the LI_INSTANCE_ID to help here. However, you’ll need to do some testing to determine how many rows each VU will consume during your test (or at least a maximum). I think in most cases this should be equal to the number of iterations you expect each virtual user to make which will vary based on test duration.

Let’s consider the following, we expect each virtual user to need up to 400 rows from our JSON or CSV file. To make our script reusable, define this in the Init context. If you want to get fancy, maybe you’ll set it as an ENV variable so you can adjust it per run

let maxIter = 400 // you'll want to define this in the init context!!!

Previously with a single instance of k6, we could do something like this in our default function to generate a unique value on each run:

let uniqueNum = ((__VU * maxIter) - (maxIter) + (__ITER));

However, as stated earlier, when dealing with multiple k6 instances in the LoadImpact cloud we will encounter some collisions as each instance will have overlapping __VU IDs.

If we make the following adjustment, we can ensure each VU gets a unique value by using their LI_INSTANCE_ID in the equation.

First, you need to know that the LoadImpact cloud currently will put a maximum of ~200~ 300 VUs per load generator. With that in mind, this means that _VU 300 in the above case at __ITER 400 will be at line 120,000 in our source file.

With that in mind we can rather simply do the following, so that each instance, will start in their appropriate “block” of the source data:

let uniqueNum = ((__VU * maxIter) - (maxIter) + (__ITER) + (120000 * __ENV["LI_INSTANCE_ID"]));

Edits: updated due to changes in load generator limits in LI Cloud. 200 ->300 max VUs

1 Like

@mark thank you for this topic. I have already arranged VUs in my test to be unique as was required for our platform performance testing.

The last point that I can’t get it’s how to distribute VUs uniqueness when multiple instances (e.g load generators) are raised. :thinking:

Initially, we need to mimic 3K unique users, but currently started from 500, so following the context of 500 users and as I understand, as soos as my cloud test will reach out 300 VUs - it will be raised the second insntance which will start generating duplicated VU IDs.

How to correctly make the calculation?
let uniqueNum = ((__VU * maxIter) - (maxIter) + (__ITER) + (80000 * __ENV["LI_INSTANCE_ID"])); - can’t relate this formula for my test.

For instance, we have to run 500 VU’s(unique accounts, where email has index from 1 to 500) for 1 hour, I don’t know how many iterations it will be, also I don’t know the ID of “LI_INSTANCE_ID”?

This answer is deprecated. Saved for posterity.

@Alexander I missed an edit when updating recently. Let’s start with the 80/120k number. That’s the amount of data needed per load gen. 300 VUs on one load gen, making 400 iterations would reach line 120k. As you have more load gens, this number could grow. You need to do some guesstimating at first.

How many total sessions are you looking to generate Or how large is your data source? We need to solve for some thing to set this in our test. You could probably get fancy later on and read the length of the file and calculate that in line dynamically, but initially you’ll probably need/want to work through the math.

Maybe it would be helpful for you and others to step through the formula a bit(There may very well be a more efficient way to do this):

let uniqueNum = ((__VU * maxIter) - (maxIter) + (__ITER) + (120000 * __ENV["LI_INSTANCE_ID"]));

(__VU * maxIter) - (maxIter) // Set's the current unique number to 0 for VU 1, value of maxIter for VU 2, etc. this way they all start at a unique point in the data source (no collisions)

+ (__ITER) // Add's 1 per iteration

+ (120000 * __ENV["LI_INSTANCE_ID"]) // for instances > 1, let's those VUs start at a higher row as their "0"

Hope that helps clear things up a bit!

This answer is deprecated. Saved for posterity.

@Alexander I spoke to one of my colleagues with a stronger math background than myself. He came up with another solution that might be less confusing when dealing with even distributions:

let VUsTotal = 1000 // Set total script's total VUs amount here 

let VUsPerInstance = 250  // minimum VUs per instance in the cloud execution

let InstancesTotalUpperEstimate = Math.ceil(VUsTotal / VUsPerInstance)
let uniqNum = (__ITER * VUsTotal + (__VU - 1)) * InstancesTotalUpperEstimate + __ENV["LI_INSTANCE_ID"] 

Note that VUsPerInstance requires some thinking on your part and the number above is representative of this example. 1000 VUs / 300 max VUs = 3.33 instances required. As we can’t have .33 of an instance, we round up to 4. 1000 VUs across 4 instances would be 250 per instance. This also assumes even distribution! If you start to have uneven distribution it gets a bit more complex.

As you can see there are multiple ways to go about this, I hope this clears things up a bit though!

@mark thank you for the update! I will consider it a bit later. Now we intensively work on the performance issues with already existing loading but later on we will need more users eventually, so this formula will come in handy.

One thing that I can say right now that there is no option to test that it works correctly since no, at least, console.log is available to use while running in cloud…

@mark - I have a scenario in which I’m trying to assign each VU a unique user account from a list of 5000 unique user accounts. The issue I’m having is that I cannot determine how the VUs are split across instances. I’ve done some testing and when I run my test with 500 VUs I can see that the users are split across 2 instances with 250 VUs each, this makes sense. However, if I run my test with 1000 VUs I can see that the VUs are only utilizing 1 instance. I’m making this determination by running console.log(__ENV[“LI_INSTANCE_ID”]) and it is never greater than 0. It seems as if how the VUs are split is a factor of how many VUs the test is running. Is there anyway to reliably determine the splits? Any help would be appreciated!

@scott We actually had a recent breaking change that we should have documented here in regards to the cloud. I’m going to remove the solution tag as it will depend on test size entirely now. We do plan to introduce completely unique VU IDs that would remove all this messy math. I am not sure what that timeline is, however.

That said, we’ve introduced some tiering of hardware for cloud tests to improve spin up times, data processing and general stability. This tiering doesn’t impact the resources per VU as we linearly increase instance size (I know you didn’t ask about this, but I’m sure someone will read this in the future and will have that question). Anyway, here is how it goes:

We have 3 tiers of hardware for load-generation. The tier we choose depends on the number of VUs allocated to a load zone.

Tier 1 is used when there are 1-999 VUs in a load zone
Tier 2 is used when there are 1000-4001 VUs in a load zone
Tier 3 is used when there are more than 4001 VUs in a load zone.

The tier 1 server handles up to 300VUs
The tier 2 server handles up to 1200VUs
The tier 3 server handles up to 5000VUs.

For example, if you start a test with 900VUs, we will use 3x Tier 1 servers.
If you start a test with 1000VUs in a single load zone, we will use 1x Tier 2 server. If the same test is started in 2 load zones, there will be 500VUs per load zone and 4x Tier 1 servers will be used.

So you will need to determine, based on test size and config, what size machines you will use, then you can use the correct max VUs per load gen.

@mark - Thank you for the info, I can make this work for my test. Thanks for the speedy response!

@mark Thanks for your help! After two years your answer help me a lot! I made my own logic to get data directly in database (specific one to performance tests).
We first create data to perform tests run scripts after, the same logic we use to get data in k6 we create data in data load script.

1 Like

@fernandoveras Glad to hear it’s still helping people! I’ve edited my top level post to hopefully direct people to the most important information. If your logic is different enough - please feel free to share a code snippet as it may help someone else.

Hello!

Could you share please how you do it?

Hello, sorry for late response.

I made little changes, i got the main logic who @mark has said and add a constant name on it, getting like that:

${CONSTANT_DEFAULT_USER_NAME}${__VU + 100}

The code above we use to login in system with dynamically users X VUs.
We only make login on system when is the first iteration (_ITER === 0)

For get data, we get the same logic who @mark said here but add the constant name on it, getting like that:

${CONSTANT_DEFAULT_DATA_NAME}${__VU * 100 + __ITER}

Like i said, it’s a little changes.

1 Like

BIG NEWS! Remember all that messy math we’ve spoken about for months, above? Well, throw it away because a new era has arrived! :smile: :tada:

As of k6 0.34.0 we have released k6/execution which includes a property scenario.iterationInTest. This will return a unique value across all VUs, even in cloud tests!!!

For example:

import exec from "k6/execution";
import { SharedArray } from "k6/data";

const data = new SharedArray("my dataset", function(){
  return JSON.parse(open('my-large-dataset.json'));
})

export const options = {
  scenarios :{
    "use-all-the-data": {
      executor: "shared-iterations",
      vus: 100,
      iterations: data.length,
      maxDuration: "1h"
    }
  }
}

export default function() {
  // this is unique even in the cloud
  var item = data[exec.scenario.iterationInTest];
  http.post("https://httpbin.test.k6.io/anything?endpoint=amazing", item)
}

Full release notes are here: Release v0.34.0 · grafana/k6 · GitHub
Docs on k6/execution: k6/execution

8 Likes

Hello, How does this work in multi scenario model. Is exec.scenario.iterationInTest for overall test or separate for each scenario.

Hi @Rama,

exec.scenario.iterationInTest is unique per scenario, not the overall test, so you’ll need to split your data per scenario as well.

We’ll consider exposing more properties in the future that will help with this use case, but for now this is the best approach, sorry.

1 Like

Refer to this solution, I used SharedArray and papaparse in the code when loading and parsing large CSV file, but when executing the JS, I found that K6 used too much memory to run, which lead to my script not being able to run at all. Finally aborted in the middle when the system resources are exhausted. Is there a better solution for loading and parsing large files?


Hi @sunnini i think your sharedArray its ok, maybe the problem is the crypto function in the test? However you need to create a new post for this question.

Greetings.
Gino.

A post was split to a new topic: Parameterizing data