Getting Service Unavailable Message

gerardlj · October 20, 2022, 6:17am

Hi Team,

While running a large test (>1k users) we are observing a “Service Unavailable” message on the console logs but didn’t see any errors on the web or app layers and timeouts also on the specified layers more than the total time of our test. It looks like a time-out issue on the k6 side. Can you explain this to me or you guide me to resolve this issue?

I’m using k6 docker for execution on AWS Code Build, we are using a large instance for the test.

docker pull grafana/k6

olegbespalov · October 20, 2022, 11:04am

Hi @Gerard !

It looks like a time-out issue on the k6 side. Can you explain this to me or you guide me to resolve this issue?

1k users in most cases is not a big deal for the k6. However, it can be a deal for the testing services. So honestly, it looks more like an issue with the service (your backends)

I’d recommend you investigate more, check hardware metrics (CPU, memory), and maybe check load balancers metrics. Well, basically, try to get as much as you can from your observability.

Hope that answers,
Cheers

gerardlj · October 21, 2022, 4:55pm

@olegbespalov

We investigated the issues but didn’t find any single evidence for backend services (LB, NLB, Web, and App) issues. Little strange about the error codes and message. Please find below.

HTTP Status code says 0

and the

K6 error code says 1000

and the error message says - “Service unavailable”, also the response body and headers are null. Looks like it didn’t hit any of the backend services.

Guide me on this to troubleshoot

olegbespalov · October 24, 2022, 7:59am

Hey @Gerard ,

The error is pretty explicit, “Service unavailable” means that k6 has no chance to reach your service because it’s unavailable…

The HTTP code 0, in that case, means that we simply didn’t get any code. It’s simply not defined.

Unfortunately, I can’t give you any other advice than I’ve already provided. Just continue investigations on your side using the metrics that you have in your system. Try to scale the system up & down to see what load is causing your service unavailability.

Check the CPU & Memory, and try to scale up vertically if needed.

As I said again, the 1k user isn’t a significant number for the k6, so you found the bottleneck in your system.

gerardlj · November 16, 2022, 8:10am

Any other metrics i can check on this issue other than “http_req_failed”

olegbespalov · November 16, 2022, 9:22am

Hi @Gerard

Any other metrics i can check on this issue other than “http_req_failed”

Sorry, I’m not sure if I follow What exactly do you want to achieve by checking another metric?

gerardlj · November 16, 2022, 10:01am

Get some more information about the failed request. I checked the response API “response.remote_ip”, didn’t help alot.

Simply I want to trace the failed request, and find the cause. We didn’t see any single error, warning on the LB, Web and app layers.