Getting Service Unavailable Message

Hi Team,

While running a large test (>1k users) we are observing a “Service Unavailable” message on the console logs but didn’t see any errors on the web or app layers and timeouts also on the specified layers more than the total time of our test. It looks like a time-out issue on the k6 side. Can you explain this to me or you guide me to resolve this issue?

image

I’m using k6 docker for execution on AWS Code Build, we are using a large instance for the test.

docker pull grafana/k6

Hi @Gerard !

It looks like a time-out issue on the k6 side. Can you explain this to me or you guide me to resolve this issue?

1k users in most cases is not a big deal for the k6. However, it can be a deal for the testing services. So honestly, it looks more like an issue with the service (your backends) :frowning_face:

I’d recommend you investigate more, check hardware metrics (CPU, memory), and maybe check load balancers metrics. Well, basically, try to get as much as you can from your observability.

Hope that answers,
Cheers

@olegbespalov

We investigated the issues but didn’t find any single evidence for backend services (LB, NLB, Web, and App) issues. Little strange about the error codes and message. Please find below.

HTTP Status code says 0

and the

K6 error code says 1000

and the error message says - “Service unavailable”, also the response body and headers are null. Looks like it didn’t hit any of the backend services.

Guide me on this to troubleshoot

Hey @Gerard ,

The error is pretty explicit, “Service unavailable” means that k6 has no chance to reach your service because it’s unavailable…

The HTTP code 0, in that case, means that we simply didn’t get any code. It’s simply not defined.

Unfortunately, I can’t give you any other advice than I’ve already provided. Just continue investigations on your side using the metrics that you have in your system. Try to scale the system up & down to see what load is causing your service unavailability.

Check the CPU & Memory, and try to scale up vertically if needed.

As I said again, the 1k user isn’t a significant number for the k6, so you found the bottleneck in your system.

1 Like

Any other metrics i can check on this issue other than “http_req_failed”

Hi @Gerard

Any other metrics i can check on this issue other than “http_req_failed”

Sorry, I’m not sure if I follow :thinking: What exactly do you want to achieve by checking another metric?

Get some more information about the failed request. I checked the response API “response.remote_ip”, didn’t help alot.

Simply I want to trace the failed request, and find the cause. We didn’t see any single error, warning on the LB, Web and app layers.