Why do k6.http_req_duration.max and k6.http_req_duration.95percentile have the same value in Datadog?

Hello K6 Team,
I started integration between K6 and DataDog.
I ran some tests and pushed metrics to DataDog successfully.
Then I started to compare the received metrics and noticed that the metrics are the same for

k6.http_req_duration.max and k6.http_req_duration.95percentile

They should not be the same based on the original logs from K6 itself

Could someone answer my question:

  1. Why k6.http_req_duration.max and k6.http_req_duration.95percentile have the same value in Datadog ?

I attached two screens of the output

K6 version: v 0.41

Hi Max,

as you’ve seen from the discussion in Http_req_duration wrong datadog, this is related to how metrics are aggregated for Datadog. Specifically, see this explanation.

The problem is that Datadog already receives aggregated metrics from the DogStatsD agent, which creates the 95percentile metric, and then to visualize them in a Datadog graph, you aggregate them again by choosing e.g. “avg by”. This is why they don’t match up with the aggregation that k6 shows in the end-of-test summary, which is the more reliable source.

I wasn’t able to reproduce your exact issue, though, and in my test I get different values for both, though 95percentile is still inconsistent with the k6 summary:

Reading the Datadog documentation about percentiles, it seems it would be more flexible to send some metrics to DogStatsD as distribution values. These would apparently skip the aggregation on the client/agent side, and allow you to compute any percentile in the Datadog query.

This will need more evaluation to see how well it’s supported by DogStatsD, as well as any other backends using StatsD, but it looks like it would be an improvement over the currently limited and confusing situation. I created the GitHub issue #2819, so feel free to subscribe to it for updates.

1 Like

Hello @imiric
Thanks for your answer.

I tried to find an issue re-writing k6 tests using different approaches and found out that with my current setup ( k6 + latest version of DataDog Agent ) receives the same data for metrics with aggr. functions like:

.95percentile, .99percentile, .90percentile, .max - have/the same/identical values.

The received values are valid( the same as in the k6 summary) and I was able to count accurate values for “maximum” in DataDog.

I thought metrics should be different because they are different in the k6 summary but in Datadog they have all the same values.
It looks like the agent are not aggregated them and sent them to DataDog as “raw” data.

@imiric Could you please provide your setup of integration K6 + Datadog Agent?

I may be using different versions than your setup.

My setup is quite simple:
I just download the latest version of the agent and set some env variables:

image: gcr.io/datadoghq/agent:latest
   DD_SITE: datadoghq.com
   DD_HISTOGRAM_PERCENTILES: "0.95 0.99 0.90"
   DD_HISTOGRAM_AGGREGATES: "max median avg count sum 0.95 0.99 0.90"
   DD_API_KEY: removed

Unfortunately, any current percentile metrics in Datadog are not reliable due to the multiple aggregations. You can maybe use them as reference, but don’t rely on them to get a sense of the SUT performance.

The tricky thing is that the solution mentioned in the GitHub issue has some complexities. After some investigation, the Distribution metric type is a custom extension used by Datadog and New Relic, but it’s not a standard StatsD type. So we can’t do the change globally in the statsd output to fix this issue, but would have to determine whether the backend actually supports the type, or use a “variant” option to distinguish the backends as suggested in this comment.

We’ll need to discuss the solution internally, and we’ll update that issue once we decide on the path forward. In the meantime, I’m afraid there’s no workaround for this. :frowning:

As for my Datadog setup, I followed the instructions in the documentation, so I’m using the same Docker image as you (my latest was 5a8fcbc10166, running agent v7.41.0). Though I didn’t specify those DD_HISTOGRAM* env vars you did.

1 Like