Max number of websocket connection

Hi,

I create a script on my mac to run k6 websocket stress testing. When VU increased to 3000, there will be connection error. I want to create 5000 ~ 6000 websocket connections.

K6 version: k6 v0.41.0 ((devel), go1.19.3, darwin/amd64)

The errors would be:

WARN[0027] Attempt to establish a WebSocket connection failed  error="websocket: bad handshake"

Or

WARN[0050] Attempt to establish a WebSocket connection failed  error="context canceled"

Or

WARN[0020] Attempt to establish a WebSocket connection failed  error="dial tcp 13.195.206.8:443: connect: connection refused"

The K6 executor is “Rumping VUs” with stages options.

export const options = {
  stages: [
    { duration: '1m', target: 200 }, // below normal load
    { duration: '2m', target: 500 },
    { duration: '5m', target: 1000 }, // normal load
    { duration: '5m', target: 1500 },
    { duration: '5m', target: 2000 }, 
    { duration: '5m', target: 2500 }, // around the breaking point
    { duration: '5m', target: 3000 }, // beyond the breaking point
    { duration: '1m', target: 0 }, // scale down. Recovery stage.
  ],
};

And I checked service log, there is no related error or exception. I doubt if there is some OS limits to create more websocet connection with k6.

How can I investigate what the bottleneck is?

Thanks,
Ian

Hi @ianjiang

Welcome to the community forum :wave:

If you can share the (sanitized) script, I can have a look and try to run a similar one locally.

In the meantime, since you have not seen any errors on the server side, what I’m thinking you could look at, if you haven’t yet, is:

  • Check for open file limits: On macOS, the maximum number of open file descriptors is set by the system’s ulimit. Use the command ulimit -nto check the current limit and ulimit -n [number] to increase it if necessary.
  • Monitor system resources: is there any bottleneck in your mac with regards to CPU, memory, or network usage during the test? This will give you an idea if the system is reaching its limits in terms of resources.

I hope this helps.

Cheers!

1 Like

Hi @eyeveebe

Thanks for your information.

  • I tried my local websocket service with same script. The VU could easily increase to more than 3000.

  • I doubt that my mac system source would be the bottleneck. The network most likely the root cause. But I am not sure the network is saturated.

When script is running, the network stats is below:

Network quality

   networkQuality
==== SUMMARY ====
Uplink capacity: 34.921 Mbps
Downlink capacity: 54.683 Mbps
Responsiveness: Low (17 RPM)
Idle Latency: 202.750 milli-seconds
  • Current ulimit -n is 256. I am not sure I should increase the file descriptors. I could try to change it to 512 to verify it.

Below is my script (wss url is hidden for secure)

import { WebSocket } from 'k6/experimental/websockets';
import { setInterval } from 'k6/experimental/timers';
import encoding from 'k6/encoding';
import exec from 'k6/execution';
import { Counter } from 'k6/metrics';


const CounterErrors = new Counter('errors');


export const options = {
  stages: [
    { duration: '1m', target: 3000 }, // below normal load
    { duration: '5m', target: 3000 }, // below normal load
    { duration: '1m', target: 0 }, 
  ],
};


export default function () {
  const domain = 'mytest.wss.domain.com' // faked url
  const url = `wss://${domain}/webSocket?locale=en`
  // const domain = 'localhost:8888'
  // const url = `ws://${domain}/webSocket?locale=en&userId=loadTestName-${exec.instance.vusActive}`
 
  // const userName = `loadTestName-${Math.floor(Math.random() * 3000)}`
  const userName = `loadTestName-${exec.instance.vusActive}` 
  const user = encoding.b64encode(JSON.stringify({
    logonName: userName
  }))

  const params = {
    headers: {
      'Tenant': 'e77f3484',
      'User': user
    }
  };

  // console.log('user:', userName, user)


  try {
    const ws = new WebSocket(url, [], params);

    ws.onopen = () => {
      //ws.send('lorem ipsum');
      console.log(`${userName} WebSocket connection established!`);

      setInterval(function timeout() {
          ws.ping();  
          ws.send(`lorem ipsum ${userName}`); 
          // console.log('send message and ping every 10 sec');
        }, 1000 * 10);
    };

    ws.onpong = () => {
      // As required by the spec, when the ping is received, the recipient must send back a pong.
      // console.log('connection is alive');
    };

    ws.onerror = (e) => {
      CounterErrors.add(1);
      console.log(`${userName} websocket error:`, e);
    };

    ws.onmessage = (data) => {
      console.log('a message received');
      console.log(data);
    };

    ws.onclose = () => {
      console.log(`${userName} WebSocket connection closed!`);
    };

  } catch (error) {
    CounterErrors.add(1)
    log.error('[error_bad_handshake]:', error)
    throw error
  }

}

Thanks,

Hi @eyeveebe

Thanks for your advice.

  • With same script, I test local websocket service. In result, the VU could be easily created to more than 3000.
  • CPU and Mem is quite good. Not sure the network status is saturated.

   networkQuality
==== SUMMARY ====
Uplink capacity: 44.798 Mbps
Downlink capacity: 357.742 Mbps
Responsiveness: Low (78 RPM)
Idle Latency: 156.833 milli-seconds
  • ulimit -n: changed to 10240

  • My script:

import { WebSocket } from 'k6/experimental/websockets';
import { setInterval } from 'k6/experimental/timers';
import encoding from 'k6/encoding';
import exec from 'k6/execution';
import { Counter } from 'k6/metrics';


const CounterErrors = new Counter('errors');


export const options = {
  stages: [
    { duration: '1m', target: 3000 }, // below normal load
    { duration: '5m', target: 3000 }, // below normal load
    { duration: '1m', target: 0 }, 
  ],
};


export default function () {
  //const domain = 'mytest.wss.domain.com' // faked url
  const url = `wss://test-api.k6.io/ws/crocochat/publicRoom/`
  // const domain = 'localhost:8888'
  // const url = `ws://${domain}/webSocket?locale=en&userId=loadTestName-${exec.instance.vusActive}`
 
  // const userName = `loadTestName-${Math.floor(Math.random() * 3000)}`
  const userName = `loadTestName-${exec.instance.vusActive}` 
  const user = encoding.b64encode(JSON.stringify({
    logonName: userName
  }))

  const params = {
    headers: {
      'Tenant': 'e77f3484',
      'User': user
    }
  };

  // console.log('user:', userName, user)


  try {
    const ws = new WebSocket(url, [], params);

    ws.onopen = () => {
      //ws.send('lorem ipsum');
      console.log(`${userName} WebSocket connection established!`);

      setInterval(function timeout() {
          ws.ping();  
          ws.send(`lorem ipsum ${userName}`); 
          // console.log('send message and ping every 10 sec');
        }, 1000 * 10);
    };

    ws.onpong = () => {
      // As required by the spec, when the ping is received, the recipient must send back a pong.
      // console.log('connection is alive');
    };

    ws.onerror = (e) => {
      CounterErrors.add(1);
      console.log(`${userName} websocket error:`, e);
    };

    ws.onmessage = (data) => {
      console.log('a message received');
      console.log(data);
    };

    ws.onclose = () => {
      console.log(`${userName} WebSocket connection closed!`);
    };

  } catch (error) {
    CounterErrors.add(1)
    log.error('[error_bad_handshake]:', error)
    throw error
  }

}

Thanks,

I have another round testing in office with better network bandwidth. The network is not the bottleneck.

CPU

Mem

Network

When VUs increased to about 3000, there will be errors. And network traffic is not high.

So ulimit n is ulimit -n 10240, system resource is not the limitation.(CPU, Mem, Network). Have no idea how to go forward.

Thanks,

Hi @ianjiang

I’ve been discussing this with @imiric and he agrees you could investigate on two fronts here:

  1. Check if you are reaching a limit in your SUT (System Under Test).
    • Even if you don’t see any errors or logs, there might be a load balancer in between, rate limitations when connecting from the same IP, etc.).
    • Could you try to use 2 load generators at the same time, with a load that works for each independently (2500 VUs)? That would help determine if the issues are in the SUT or in the load generators.
  2. There might still be a limit in your Mac, the load generator.
    • We have specific instructions to fine tune at mac level: Fine tuning OS. The FD work a bit differently, so please try to follow the guide and see if the guide helps.
    • If you have another OS available, any Linux, it would be great if you can test with that as load generator. To discard any Mac-related issues with the load generation.

Finally, unrelated to the issue at hand but a best practice, we would add to the script a check for the status code 101 as you can see in our basic WEbSockets example or our WebSockets example.

check(res, { 'Connected successfully': (r) => r && r.status === 101 });

I hope this helps.

Cheers!

Hi @eyeveebe,

Thanks for your advice. I will have a try with SUT and tuning Mac.

For k6 experimental websockets, how to check the websocket status? I can’t find any API could return websocet connecting response.

Hi @ianjiang

My apologies since I missed you were using the experimental websocket extension. With the new API you will get an error, so there would be no need to check the 101 Switching Protocol status. You might want to use:

let ws = new WebSocket(url, null, params)
    ws.onerror((e) => console.log(e));
    ws.onopen(() => {
      ws.onerror((e) => {something else }
    })

In this way, you’ll know if the error is in the first ws.onerror() the connection was never established, and if the second, something else failed once opened.

I would also point out that you don’t necessarily need 6000 VUs to open 6000 Websockets. The example in the v0.40.0 release notes shows how to achieve that:

import { randomString, randomIntBetween } from "https://jslib.k6.io/k6-utils/1.1.0/index.js";
import { WebSocket } from "k6/experimental/websockets"
import { setTimeout, clearTimeout, setInterval, clearInterval } from "k6/experimental/timers"

let chatRoomName = 'publicRoom'; // choose your chat room name
let sessionDuration = randomIntBetween(5000, 60000); // user session between 5s and 1m


export default function() {
  for (let i = 0; i < 4; i++) {
    startWSWorker(i)
  }
}

function startWSWorker(id) {
  let url = `wss://test-api.k6.io/ws/crocochat/${chatRoomName}/`;
  let ws = new WebSocket(url);
  ws.addEventListener("open", () => {
    ws.send(JSON.stringify({ 'event': 'SET_NAME', 'new_name': `Croc ${__VU}:${id}` }));

    ws.addEventListener("message", (e) => {
      let msg = JSON.parse(e.data);
      if (msg.event === 'CHAT_MSG') {
        console.log(`VU ${__VU}:${id} received: ${msg.user} says: ${msg.message}`)
      }
      else if (msg.event === 'ERROR') {
        console.error(`VU ${__VU}:${id} received:: ${msg.message}`)
      }
      else {
        console.log(`VU ${__VU}:${id} received unhandled message: ${msg.message}`)
      }
    })


    let intervalId = setInterval(() => {
      ws.send(JSON.stringify({ 'event': 'SAY', 'message': `I'm saying ${randomString(5)}` }));
    }, randomIntBetween(2000, 8000)); // say something every 2-8seconds


    let timeout1id = setTimeout(function() {
      clearInterval(intervalId)
      console.log(`VU ${__VU}:${id}: ${sessionDuration}ms passed, leaving the chat`);
      ws.send(JSON.stringify({ 'event': 'LEAVE' }));
    }, sessionDuration);

    let timeout2id = setTimeout(function() {
      console.log(`Closing the socket forcefully 3s after graceful LEAVE`);
      ws.close();
    }, sessionDuration + 3000);

    ws.addEventListener("close", () => {
      clearTimeout(timeout1id);
      clearTimeout(timeout2id);
      console.log(`VU ${__VU}:${id}: disconnected`);
    })
  });
}

This might not change your issues, if there is a limitation in the SUT or your Mac, though worth pointing out.

Also, if you find anything we should add/fix, kindly report in our issue tracker. This still being experimental, we really appreciate any feedback.

Cheers!