Creating a map of shared arrays

Hi.

I would like to create a map of SharedArrays and was wondering if there was a good way to do so. I currently have a data set which is a map from site IDs to productIDs. The reason I want to use a SharedArray is because some sites have up to 100k product IDs.

{
  "site0": [
    1,
    2,
    3,
    4
  ],
  "site1": [
    3,
    4,
    5,
    6
  ],
  ...
}

I tried creating an object like so:

var productMap = JSON.parse(open("productsBySite.json"));
var productsBySite = {};
var sites = Object.keys(productMap);
for (let i = 0; i < sites.length; i++) {
  productsBySite[sites[i]] = new SharedArray(sites[i], function () {
    return productMap[sites[i]];
  });
} 

I have 2 questions:

1. When retrieving an object in VU code, will just the productID be called into the VU’s memory, or is the entire map first loaded into memory?

For example:

const siteName = randomItem(sites);
const productId = randomItem(products[siteName]); 
// is the whole of products copied to VU memory, or just productID?

2. This has caused k6 to use an unsustainable amount of CPU and memory usage when starting a load test, and when using many VUs would take a very long time to start. Is there a more efficient way of performing this?

Here is the output from top when trying to initiate 100VUs using the script above. It was starting the VUs very slowly, maybe 2-3 per minute.

Thanks!

Hi @pearsone, welcome to the community forum :confetti_ball:

1. When retrieving an object in VU code, will just the productID be called into the VU’s memory, or is the entire map first loaded into memory?

The map itself is always enterally in the VU memory as it is just a normal JS object. The SharedArray though will not be taking all that much memory in each VU. And will only take more memory when you take an item from it, as it will need to allocate for that item.

2. This has caused k6 to use an unsustainable amount of CPU and memory usage when starting a load test, and when using many VUs would take a very long time to start. Is there a more efficient way of performing this?

This is because

var productMap = JSON.parse(open("productsBySite.json"));

happens in each VU and then you keep reference to this object so it never gets deallocated.

So you did everything right (from what I can see) on how to create a map of SharedArrays. But you also left the whole object you were trying to share between the VUs in memory - defeating the whole point :wink: .

You just need to not load in the top level or clean it afterwards, or load it only for the 1 VU that will create the SharedArray.

As the SharedArray constructor is only called once per k6 instance and the result is used for the rest of the VUs.

Unfortunately you need the names of the sites to create the map so you need to parse the whole JSON just for that.

Luckily it can also be part of a SharedArray.

import { SharedArray } from "k6/data";
var productsBySite = {};
var productMap;
let populateMap = () => {
  if (productMap != null) {
    return;
  }
  productMap = JSON.parse(open("productsBySite.json"));
}
var sites = new SharedArray("sites", () => {
  populateMap() // we populate the map if needed
  return Object.keys(productMap);
})
for (let i = 0; i < sites.length; i++) {
  productsBySite[sites[i]] = new SharedArray(sites[i], function() {
    populateMap() // we populate the map if needed.
    return productMap[sites[i]];
  });
}
productMap = null; // we clean the map in the end
sites = null; // this also is no longer needed

export default () => {
  console.log(JSON.stringify(productsBySite, null, "  "));
}

I even extracted it into a function and posted it as a gist here.

Hope this helps you!

1 Like

Hello, cheers for your fast response.

This is very helpful and answers the questions I was having. I will try this out this week and report back.

Thanks!

Hi,

We tried this solution and it worked.

Just to confirm - despite being in the init code, the SharedArray constructor is run just once, similar to the setup code?

If this is the case why does the SharedArray constructor need to sit in the init and not the setup code? Do you have any links to documentation or source code to help me understand it please?

Thanks!

The function provided to the SharedArray is run only once. It needs to be in the init context because that was a restriction that was added originally - it also happens to be the only place you can use open() which was the thing it was envisioned to be used with the most.

Also in the cloud (and in the future distributed mode) k6 runs setup once for the whole test not once per instance. And the date returned by setup is copied per each VU so that also does not at the moment work great as a way to not have multiple copies of the same data in memory.

You can see the original implementation here.
Some possible improvements are listed here.

Hope this helps you!