K6 to influxdb with lesser http_req_duration data

Lucas · April 7, 2021, 3:04am

We noticed that when using k6 with influxDB output, the data display in Grafana is not accurate at all.
We have tested a number of times and every test turn out that the count of http_req_duration in influxDB having a discrepancy of between 1-5% lesser.

Further finding suggest to us that k6 submit lesser data to influxDB.
We are using the latest version of k6 and influxDB of version 1.8.
May we check if this is a known issue/defect or there are things that we may be doing it incorrectly?

Thanks for the help.

We run 10 vus and 100 iterations but only 98 http_req_duration can be found in influxDB.

If it is of the same duration, influxDB will only have one record.

nedyalko · April 7, 2021, 7:07am

You might be hitting this issue: InfluxDB frequently asked questions | InfluxDB OSS 1.8 Documentation

Though, for that to happen, the http_req_duration measurements will have to have the exact same timestamp Just in case, to double check, can you add vu and iter to the list of systemTags k6 uses, and remove them from the InfluxDB “tags as fields” option? This will usually cause issues for InfluxDB, since it will make the metrics have a much higher cardinality, which is a problem for it, but should prevent any http_req_duration metrics from different VUs to have the exact same tag set.

So, something like this:

export K6_SYSTEM_TAGS="proto,subproto,status,method,url,name,group,check,error,error_code,tls_version,scenario,service,expected_response,vu,iter"
export K6_INFLUXDB_TAGS_AS_FIELDS="url"

k6 run your-script.js

Lucas · April 7, 2021, 7:18am

@ned
Thank you so much for your prompt reply. Appreciate your help.
Before your reply, after spending half a day, I manage to get it working by adding a custom tag to every post/get request and this custom tag value is generated using uuidv4().

nedyalko · April 7, 2021, 7:40am

If that solved your problem, then yeah, it was because of the strange handling of duplicate points by InfluxDB… Unfortunately, high cardinality for every metric might solve this issue, but will cause InfluxDB to fold very quickly under even the lightest of loads, and even without it, k6 can usually quickly saturate an InfluxDB server with metrics when running moderately heavy load tests…

I found an old k6 issue that describes this problem: Iterations count in InfluxDB is lower than what k6 reports · Issue #636 · grafana/k6 · GitHub

And, looking at the InfluxDB v2 docs, it seems like we might be able to greatly reduce the issue if we send the timestamp to InfluxDB in nanoseconds, since it’s going to be much less likely for two metrics to have the exact same values: Handle duplicate data points when writing to InfluxDB | InfluxDB OSS v2 Documentation

nedyalko · April 7, 2021, 8:34am

@Lucas, can you confirm that the precision setting of your InfluxDB server is nanoseconds? You haven’t changed the default: Configure InfluxDB OSS | InfluxDB OSS 1.8 Documentation

Lucas · April 8, 2021, 9:27am

@ned Yes, I didn’t change anything as shown in the screen capture below.
I get the influxdb up fast without any changes on the settings except those data directory path.

nedyalko · April 8, 2021, 10:13am

This is very strange, I wouldn’t expect us to have so many collisions when time is measured in nanoseconds, which it also is in Go

dan_nm · April 8, 2021, 4:41pm

Would only really need to do this for the vu tag, and could leave iter defined as a field (or disabled if you do not need your metrics tagged with the iteration number). In theory, unless you are doing some form of distributed test execution, there should never be more than one set of metrics for a given vu with the same timestamp, so that should be enough to avoid the issue of metrics with the same timestamp being dropped.

Leaving iter as a field should also mitigate the cardinality issue somewhat, since the number of series created would only be multiplied by the number of users you run, and not by both. Either way, every unique tag that is added increases cardinality, so using the “tags as fields” config is important.

A UUID custom tag should definitely be sent as a field, as otherwise you end up creating a new series for every single request, which will very quickly cause problems. I stumbled on that myself initially, and ended up running out of series I could insert to my database, as the server (managed by another group in my org) was configured with a finite limit per database.

Lucas · April 9, 2021, 2:38am

Good day @ned @dan_nm
Appreciate your help and advise.
Understand that sending a UUID custom tag will eventually cause issue to the InfluxDB as it create a new series for every single request.
I tried below suggested by @ned but still not getting the correct count in http_req_duration.

export K6_SYSTEM_TAGS="proto,subproto,status,method,url,name,group,check,error,error_code,tls_version,scenario,service,expected_response,vu,iter"
export K6_INFLUXDB_TAGS_AS_FIELDS="url"

k6 run your-script.js

Another approach in which I can think of is to use a custom metrics and only send one record of the custom metrics per url over to InfluxDB; and in Grafana using the custom metrics to display the url and latencies.
However, that will also mean that I need to keep track and write some math function to calculate min, median,max,mean,p95 for each unique url.

As of now, it is confirm that if I use a custom tag uuid generated for every url request; the count will be accurate.

dan_nm · April 9, 2021, 3:18pm

Interesting, passing vu as an actual tag (by removing it from the K6_INFLUXDB_TAGS_AS_FIELDS env var) should work. At least it always has for me.

Two things that come to mind:

What terminal are you using to execute your scripts? The syntax for setting/exporting environment variables can vary, so perhaps it isn’t actually setting them correctly and still passing vu as a field (or not passing it at all, as I am pretty sure vu is not a default system tag).
Have you tried against a fresh InfluxDB database? It’s possible the data series which have been created on an existing database could be interfering.

If you output your metrics to JSON as well as InfluxDB, you should be able to pretty easily identify which metrics are missing from InfluxDB, and see whether or not they have the same timestamps and sets of tags as other metrics being emitted. That was how I identified it when I faced this issue originally in my scripts. This would also help identify if the vu tag is actually getting set or not, in relation to my first question above.

Of note, system tags can also be set within the script options:

export let options = {
  systemTags: ['proto', 'subproto', 'status', 'method', 'url', 'name', 'group', 'check', 'error', 'error_code', 'tls_version', 'scenario', 'service', 'expected_response', 'vu', 'iter']
};

Unfortunately, I think the JSON config option for tags as fields is still broken, as I reported in this thread last year. So, that part needs to be done as an environment variable.

nedyalko · April 9, 2021, 4:14pm

@dan_nm, I think we may have recently fixed this bug with Rewrite influxdb collector as output by mstoykov · Pull Request #1953 · grafana/k6 · GitHub, but I can’t test right now… Can you double-check if the bug still persists with the latest k6 master docker image or a custom-built k6 binary from the master git branch?

Lucas · April 12, 2021, 3:02am

Hi @dan_nm,

I’m using windows to execute my script.

I have just do a retest using a fresh influxDB.
In my script, I have set the system tags as follows:

export let options = {
    vus: 10,
    iterations: 100,
    teardownTimeout: '60s', //Specify how long the teardown() function is allowed to run before it's terminated and the test fails.
    systemTags: ['proto', 'subproto', 'status', 'method', 'url', 'name', 'group', 'check', 'error', 'error_code', 'tls_version', 'scenario', 'service', 'expected_response', 'rpc_type', 'vu', 'iter']
};

Thereafter, I run my script using the command below:

k6 run --vus 10 --iterations 100 -e K6_INFLUXDB_TAGS_AS_FIELDS=url XXX.js --out influxdb=http://localhost:8086/XXX

I noticed that if I run the script once; the count of the url will be correct; which is 100.

However, if I run the script again of 100 iteration after completing the first run, the count will not be correct, as shown in the screen capture below.

And I noticed that if I remove the UUID: loginUuid as shown below. The count is confirm not correct after running twice one after another.

var loginUuid = uuidv4();

 let thirdPartyLoginPayload = JSON.stringify({
                    DeviceType: 1,
                    ThirdPartyId: THIRD_PARTY_CODE,
                    MemberId: randomUser.MemberId
});
 let thirdPartyLoginParams = {
                    headers: {
                        'Content-Type': 'application/json',
                        'Token': THIRD_PARTY_LOGIN_ACCESS_TOKEN
                    },
                    tags: { **UUID: loginUuid**, testid: 'login' }
}

 thirdPartyLoginResponse = http.post(THIRD_PARTY_LOGIN_URL,
                    thirdPartyLoginPayload,
                    thirdPartyLoginParams);

mstoykov · April 12, 2021, 10:05am

HI @dan_nm ,

This might actually be a Windows + go problem as explained in this comment the gist of which is that windows time is in general not a very high resolution or at least not as high resolution as it is used by go.

I don’t think we can do much really and the workarounds are:

introduce a tag to make certain you are not sending metrics with same time and tag set (what you’ve done)
aggregating metrics in k6 so that this basically isn’t a problem - we plan on doing this but unlikely to happen soon (1-2 releases)
Try telegraf for aggregating metrics … .although my experience is spotty. So far the best I have managed to do is statsd input (which aggregates quite well) + influxdb output (in my case influxdbv2, but that really isn’t necessary AFAIK). Here is influxdbv1->influxdbv2 setup with some aggregation that isn’t great (as noted in the comment).
I wonder if influxdbv2 hasn’t fixed it and trying GitHub - li-zhixin/xk6-influxdbv2: A k6 extension output to influxdb. would help
Don’t use windows for running your load tests