How do I load test the performance of a Minio server?

Hi guys, so I am performing a stress test on a Minio server that should be expecting download requests from 500 VUs. I am using k6’s example on stress test, and I have a batch request of 9 URLs, like the one below:

export const options = {
  scenarios: {
    stress: {
      executor: 'ramping-arrival-rate',
      preAllocatedVUs: 500,
      timeUnit: '1s',
      stages: [
        { duration: '2m', target: 10 }, // below normal load
        { duration: '5m', target: 10 },
        { duration: '2m', target: 20 }, // normal load
        { duration: '5m', target: 20 },
        { duration: '2m', target: 30 }, // around the breaking point
        { duration: '5m', target: 30 },
        { duration: '2m', target: 40 }, // beyond the breaking point
        { duration: '5m', target: 40 },
        { duration: '10m', target: 0 }, // scale down. Recovery stage.
      ],
    },
  },
};

export default function () {
  // const url = 'http://localhost:4000/api/s3/object/download';

  const req1 = {
    method: 'POST',
    url: 'http://localhost:4000/api/s3/object/download',
    body: {
      s3Path: 'TEST/test.zip',
    },
    params: {
      headers: {
        'Content-Type': 'application/x-www-form-urlencoded',
      },
    },
  };

  ...

const req9 = {
    method: 'POST',
    url: 'http://localhost:4000/api/s3/object/download',
    body: {
      s3Path: 'TEST/test20.zip',
    },
    params: {
      headers: {
        'Content-Type': 'application/x-www-form-urlencoded',
      },
    },
  };


  http.batch([req1, ..., req9]);
  sleep(1);
}

When I run this stress test, I usually get a “Request Failed” warning from k6, and error is about the “Post” method unable to access the api after 80 VUs. However, the API is still receiving requests from k6 and downloading the objects from Minio. At this point, I’m thinking that the Minio server is unable to handle this many users, or I’m doing something wrong with k6. Reason why I think I’m doing something wrong with k6 is that even though I’m getting the warning “Request Failed”, I’m left with 0 interrupted iterations and the API is still receiving requests. Any advice will be much appreciated! :slight_smile: #k6isfun

Hi @ncbernar

Welcome to the community forum :wave:

This could be quiet a load, as each iteration will launch 9 requests to download, and could be potentially using the 500 preAllocatedVUs VUs to reach the 40 arrivals per second.

What do you see in the load generator instance, is there any bottleneck around the breaking point? It could be a network bottleneck when downloading the files. Or any other issues in the k6 instance. If Minio was not the issue.

Can you share the complete test ouput, so we can have a look? How long does each iteration take?

How big are the zip files being downloaded from Minio?

Cheers!

Hi @eyeveebe ,

Thanks for the reply! Glad to be here. I actually changed up the scenario and made it a load test instead:

export const options = {
  scenarios: {
    load: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '10m', target: 100 },
        { duration: '15m', target: 200 },
        { duration: '10m', target: 100 },
        { duration: '10m', target: 0 },
      ],
      tags: { test_type: 'load' },
      gracefulRampDown: '5m',
    },
  },
  thresholds: {
    'http_req_duration{test_type:load}': ['p(99)<180000'],
  },
};

I lowered my requirements and the most VUs we should be getting is 200. Another thought I came to realize is how unlikely it is for that many users to send 9 requests to download the same files in a practical application. But I still think it’s nice to test Minio’s performance.

To answer your question, zip files are ~70MB and the other non-zip files are ~8MB. Each iteration will take around 53s for 50 active VUs, 1.75 minutes for 100 active VUs, and 2.75minutes for 150 active VUs. These metrics are specifically for the load test I provide in this comment. Around 175 active VUs, we get around 3 minutes per iteration. Here is a snapshot of the results in Grafana.

Another question I had in mind is: should I be using http.batch this way to simulate this kind of scenario?

Hi @ncbernar

Apologies for the delay in answering. One thing you can try with your current scenario, if you haven’t, is to discard the body responses. This for sure is adding to the bottleneck in the load generator instance, if the problem is there.

Have you observed any network bottleneck in the Minio server (which would be something that tells you about the server limits), or at the load generator instance? We were doing some numbers and with 10Gb you would be able to do 1250mb in a second. 70mb*200users= 14000mb. So your requests might not be fast enough (>10s per request).

That said, I think it would be best if we can first establish what is worrying you about your Minio server, how your platform looks like, and then we can advice on the best approach for the test (executors, batch requests, etc.).

What are the specifications on the client side, and the server side (network interface especially)? Are you monitoring the network on both ends to see if there is an issue (or on any other resources)? I would initially try to make sure where the current bottleneck is, and for that you need yes on both systems (load generator and Minio).

What’s the goal of this test, do you have a certain rps (requests per second) or transfer rate you want Minio to serve? For that, we would probably go for a load testing scenario, and depending on the bandwidth needed from the clients, you might need to distribute the tests. If the bottleneck is not on the server-side/Minio, that is.

If the idea is to find the breaking point, as you seem to have been doing, we are in the realm of stress testing. And probably the objective there is to check when the system under test will break, and what is the cause of the bottleneck (bandwidth, cpu, etc.). All systems have a limit, and it’s good to know where that is, and what is the cause. So you would know, if more capacity is needed, how to scale.

Whatever the scenario, you might reach a point where the client is not realistic enough with one generator. If Minio has not reached the limits, it can be that the load generator has. In real life you would have requests come from different clients/IPs, while with this current load testing scenario you also have the limit on simulating the load with just one generator.

Hi @eyeveebe ,

Thank you so much for the answer. This gives me a lot to think about on both sides. I will begin to investigate the applications giving the performance bottleneck.

1 Like