Grpc: Reuse connection?

ansel1 · January 12, 2021, 4:39pm

The new grpc support is great. Is there a way to re-use the client connection? It seems that connecting in setup() and closing in teardown() doesn’t work.

I’m trying to figure out some strange benchmarking results: I have a service exposed via grpc and grcp-gateway. I’m using k6 to benchmark both. The grpc-gateway results are slightly faster, which doesn’t really make sense: grpc-gateway is calling the same grpc endpoint as the native grpc benchmark, and it’s all on localhost, so grpc-gateway should be at least as slow… I’m thinking the reason is that go’s http library is re-using the connection for the http benchmark, but the native grpc benchmark is repeating the TCP connection setup on each iteration.

mstoykov · January 12, 2021, 5:36pm

Hi @ansel1 , welcome to the community forum.

I think you mistyped it … but k6 is slower than something else, and you think it is because k6 isn’t reusing the grpc connection? This is somewhat … not true on a different level, but you can reuse the connection if you just don’t call close at all, and connect only the first time … or when/if you disconnect, which will likely be trickier ;).

So for example if you change this sample code from the repository and change it to be:



export default () => {   
    if (__ITER == 0) {// only on the first iteration
        client.connect("127.0.0.1:10000", { plaintext: true })
    }
    const response = client.invoke("main.RouteGuide/GetFeature", {
        latitude: 410248224,
        longitude: -747127767
    })

    check(response, { "status is OK": (r) => r && r.status === grpc.StatusOK });
    console.log(JSON.stringify(response.message))

//    client.close()
}

And then remove the sleeps in the server implementation that goes with it … it gets around 2-3x faster (it barely does anything so ).

This will likely need a wrapper around the invoke call to check that the connection hasn’t disconnected for some reason, but other than that it should work fine

Hope this helps, and likely in the future we will have a better way to do this

ansel1 · January 19, 2021, 6:43pm

Awesome, thanks, I will give that a try. But curious, what did you mean by “somewhat not true”?

To clarify by test setup: I have a single, golang executable. It exposes a grpc server on one port. It also exposes a web server on another port. The handler for the web server is grpc-gateway, which is a tool that makes grpc services accessible via a REST-ish interface. The handler translates REST-ish requests (JSON over http) into grpc calls. The handler directs those grpc calls to a grpc client which is connected to grpc server port. So even though grpc-gateway is running in the same process as the grpc server, it still calls the grpc server over the network, not in-process.

So my logic is, if an external client calls the grpc server directly, that should be a bit faster than an external client sending a REST request to the grpc-gateway. grpc-gateway is pure overhead.

I have two k6 scripts: one calls grpc directly, one calls the grpc gateway. The grpc gateway script was running faster (though only by a hair). That wasn’t the result I expected.

nedyalko · January 19, 2021, 7:11pm

I assume that by “somewhat … not true on a different level” @mstoykov meant that the network connection for gRPC’s underlying HTTP/2-based transport might not be fully closed. I am not sure if that is the case or not, but I wouldn’t be surprised.

Did you try to not close the gRPC connection every iteration? That should put the gRPC code on at least an equal level with the HTTP requests k6 will make, since k6 will use keep-alive connections by default (though that can be disabled with the noConnectionReuse option).

The only other cause of the discrepancy you describe I can think of is the k6 gRPC marshaling and unmarshaling of messages. It’s unlikely it affects results that much, but the dynamic nature of the k6 gRPC implementation is going to be much more inefficient than a dedicated marshaling code generated by protoc.

Jamie_1 · February 3, 2023, 7:29pm

I’ve been experimenting and this is what I found after going round it circles a few times.

The above example works, on the first iteration only create the once. It is super fast as stated.

The confusion has been for me around Once per VU* Init code. I initialised an array in the init code and I only write to this array a connection if it has not been written to before. based on this Once per VU* rule there should be 3000 arrays with one record each. This is not the case as I output the vu that get written to my array as they are get written to. in the first stage vu with id of 102 asked for a connection. there where only supposed to be 100 views in this stage so without looking to see if some are missing I can see straight away that there are no duplicates. therefore no duplicate arrays.

Keep up as it’s a brain tweezer. I had the same logic for error handling where I counted up to 5 errors then bombed out. This appeared to be triggered once per user as I was getting he counter 5 then 5 repeated many times before it went to four and then three. This means that this plain js counter was being established many times.

Any way cut a long story short a single variable with a connection throws out saying too many connections being made. If it was only one per vu as init should be then this would not be an issue. The array does work and is lightening fast and I can only surmise that this is because each vu has its own connection and it is not waiting on the single shared connection to finish and invoke and its not creating a new connection every iteration.

if anyone can explain the life cycle issue to me, you will have made friend. If this happens to be a bug known or otherwise I would argue in a use case scenario that this is a valid use case for this type of bug. No race condition will happen on write because you are only writing once to an empty array object. The shared array is not valid as this is rightly immutable.

EDIT: I think I know where I went wrong yesterday and was trying to create the connection in in the default method. So I moved it up to the init and I get this, although I was null checking in default method I thought this would avoid the need for this:

ERRO[0000] {“value”:“connecting to a gRPC server in the init context is not supported”} source=console

I cannot see anyway around I ramp up to 3000 users using my array mentioned above. So im confident that this is the only solution available for this. when I output the VUS id I get unique numbers. It does look like 2 got through from the first stage of 100 as I was reporting 102 and 101 id’s

Fyi I know about threshold abort, its the life cycle that is of most interest to me

mstoykov · February 15, 2023, 1:57pm

Hi @Jamie,

I am not certain I understood your question and whether you fixed it or not.

BUt writing in 2 years old threads is not good idea which is probably why nobody responded to you in 12 days.

Can you please open a new thread and try to write a small script thax examplefies your confusion and question.

Thanks!

Jamie_1 · February 15, 2023, 2:46pm

Hey

Its ok I figured it out. Instead of using first iteration you can use an init var to hold the connection. Remove the close, this then gives you a connection per user. In my trials it has made the spike test very efficient. A lot less connection errors and maximum efficiency, assuming it is because it is not waiting on the one connection to finish what it is doing.

Thanks

Jamie