When parameterizing data, how do I not use the same data more than once in a test?

Depending on the system under test(SUT), it’s often a requirement to not use the same data more than once in a test or at least ensure that Virtual Users(VUs) are not concurrently using the same data, such as login credentials. To do this we must calculate a unique number for the VUs to use during the test. I’ll share one example as a reply, but please add any methods you may use for your use case.

As mentioned above, in order to prevent collisions between VUs accessing data from an external source, we need to calculate a unique number per VU iteration. Luckily, we have a few pieces of data we can use to calculate this. k6 provides the ID of VUs per load generator(k6 instance) as well as an iteration for each VU.

__VU is the ID of a VU. It is 1 based and assigned sequentially as VUs ramp up. Every VU on a load generator will have a unique VU ID.
__ITER is 0 based and increases sequentially as the default function is completed by VUs. Every VU will have their own iteration count.

Consider this example, with my script uniqueNum.js:

import http from "k6/http";
import { sleep } from "k6";

export default function() {
  http.get("http://test.loadimpact.com");
  console.log(`VU: ${__VU}  -  ITER: ${__ITER}`);
  let uniqueNumber = __VU * 100 + __ITER -100 ;
  console.log(uniqueNumber);
  sleep(1);
};

If I run the above script k6 run uniqueNum.js -i 10 -u 10 I will see in the console that each VU will start logging unique numbers in my console window, separated be 100. Resulting in no collisions. I can use this number to select a position from an external file.

/*
    Separate file where contents of data.json follows this pattern:
*/
    {
        "users": [
            { username: "test", password: "qwerty" },
            { username: "test", password: "qwerty" }
        ]
    }

Then, if our script looked something like:

const data = JSON.parse(open("./data.json"));

export default function() {
    let uniqueNumber = __VU * 100 + __ITER;
    let user = data.users[uniqueNumber];
    console.log(data.users[uniqueNumber].username);
}

We are now accessing a unique position per iteration and per virtual user.

Some things to keep in mind regarding the above method:

  • In our calculation of uniqueNumber, 100 would be the maximum iterations VUs can make before collisions. My selection of it was arbitrary here, you can decrease it or increase it based on need. You may know your test will never exceed 20 iterations per VU
  • The higher the above number, the larger your source file must be. A test with 200 VUs would need 20k lines/rows of unique data if all iterations were completed
  • In the Load Impact Cloud, a maximum of 200 VUs are assigned per load generator. So you would need to take LI_INSTANCE_ID into account when calculating unique values. More info on Load Impact env variables here
1 Like

I’ve been assisting some users with using unique data across multiple k6 instances in the LoadImpact cloud. Since each instance will have overlapping __VU IDs. This adds some complexity to determining our unique number, so I will share some of the required thinking to solve this issue. As mentioned in my last point above we can use the LI_INSTANCE_ID to help here. However, you’ll need to do some testing to determine how many rows each VU will consume during your test (or at least a maximum). I think in most cases this should be equal to the number of iterations you expect each virtual user to make which will vary based on test duration.

Let’s consider the following, we expect each virtual user to need up to 400 rows from our JSON or CSV file. To make our script reusable, define this in the Init context. If you want to get fancy, maybe you’ll set it as an ENV variable so you can adjust it per run

let maxIter = 400 // you'll want to define this in the init context!!!

Previously with a single instance of k6, we could do something like this in our default function to generate a unique value on each run:

let uniqueNum = ((__VU * maxIter) - (maxIter) + (__ITER));

However, as stated earlier, when dealing with multiple k6 instances in the LoadImpact cloud we will encounter some collisions as each instance will have overlapping __VU IDs.

If we make the following adjustment, we can ensure each VU gets a unique value by using their LI_INSTANCE_ID in the equation.

First, you need to know that the LoadImpact cloud currently will put a maximum of ~200~ 300 VUs per load generator. With that in mind, this means that _VU 300 in the above case at __ITER 400 will be at line 120,000 in our source file.

With that in mind we can rather simply do the following, so that each instance, will start in their appropriate “block” of the source data:

let uniqueNum = ((__VU * maxIter) - (maxIter) + (__ITER) + (120000 * __ENV["LI_INSTANCE_ID"]));

Edits: updated due to changes in load generator limits in LI Cloud. 200 ->300 max VUs

1 Like

@mark thank you for this topic. I have already arranged VUs in my test to be unique as was required for our platform performance testing.

The last point that I can’t get it’s how to distribute VUs uniqueness when multiple instances (e.g load generators) are raised. :thinking:

Initially, we need to mimic 3K unique users, but currently started from 500, so following the context of 500 users and as I understand, as soos as my cloud test will reach out 300 VUs - it will be raised the second insntance which will start generating duplicated VU IDs.

How to correctly make the calculation?
let uniqueNum = ((__VU * maxIter) - (maxIter) + (__ITER) + (80000 * __ENV["LI_INSTANCE_ID"])); - can’t relate this formula for my test.

For instance, we have to run 500 VU’s(unique accounts, where email has index from 1 to 500) for 1 hour, I don’t know how many iterations it will be, also I don’t know the ID of “LI_INSTANCE_ID”?

@Alexander I missed an edit when updating recently. Let’s start with the 80/120k number. That’s the amount of data needed per load gen. 300 VUs on one load gen, making 400 iterations would reach line 120k. As you have more load gens, this number could grow. You need to do some guesstimating at first.

How many total sessions are you looking to generate Or how large is your data source? We need to solve for some thing to set this in our test. You could probably get fancy later on and read the length of the file and calculate that in line dynamically, but initially you’ll probably need/want to work through the math.

Maybe it would be helpful for you and others to step through the formula a bit(There may very well be a more efficient way to do this):

let uniqueNum = ((__VU * maxIter) - (maxIter) + (__ITER) + (120000 * __ENV["LI_INSTANCE_ID"]));

(__VU * maxIter) - (maxIter) // Set's the current unique number to 0 for VU 1, value of maxIter for VU 2, etc. this way they all start at a unique point in the data source (no collisions)

+ (__ITER) // Add's 1 per iteration

+ (120000 * __ENV["LI_INSTANCE_ID"]) // for instances > 1, let's those VUs start at a higher row as their "0"

Hope that helps clear things up a bit!

@Alexander I spoke to one of my colleagues with a stronger math background than myself. He came up with another solution that might be less confusing when dealing with even distributions:

let VUsTotal = 1000 // Set total script's total VUs amount here 

let VUsPerInstance = 250  // minimum VUs per instance in the cloud execution

let InstancesTotalUpperEstimate = Math.ceil(VUsTotal / VUsPerInstance)
let uniqNum = (__ITER * VUsTotal + (__VU - 1)) * InstancesTotalUpperEstimate + __ENV["LI_INSTANCE_ID"] 

Note that VUsPerInstance requires some thinking on your part and the number above is representative of this example. 1000 VUs / 300 max VUs = 3.33 instances required. As we can’t have .33 of an instance, we round up to 4. 1000 VUs across 4 instances would be 250 per instance. This also assumes even distribution! If you start to have uneven distribution it gets a bit more complex.

As you can see there are multiple ways to go about this, I hope this clears things up a bit though!

@mark thank you for the update! I will consider it a bit later. Now we intensively work on the performance issues with already existing loading but later on we will need more users eventually, so this formula will come in handy.

One thing that I can say right now that there is no option to test that it works correctly since no, at least, console.log is available to use while running in cloud…