Statsd_exporter and DogStatsD-metrics: Need help with understanding Trend metrics and values pushed via statsd_exporter-DogStatsD-style-metrics

Hi! So we have such combo in our infrastructure:
k6.io → statsd_exporter → Prometheus → Grafana
And I’m trying to visualize these Trend-metrics:

  • http_req_duration
  • http_req_connecting
  • http_req_tls_handshaking
  • http_req_sending
  • http_req_waiting
  • http_req_receiving

using some sort of graph (e.g.: Graph (old) (type of panel in Grafana)):

I can see that statsd_exporter aggregate them in such way:

# HELP k6_http_req_duration Metric autogenerated by statsd_exporter.
# TYPE k6_http_req_duration summary
k6_http_req_duration{expected_response="true",group="::01. MethodName",method="POST",name="http://localhost:8080/api/methodName",project_name="ProjectName",proto="HTTP/1.1",scenario="default",status="200",quantile="0.5"} NaN

k6_http_req_duration{expected_response="true",group="::01. MethodName",method="POST",name="http://localhost:8080/api/methodName",project_name="ProjectName",proto="HTTP/1.1",scenario="default",status="200",quantile="0.9"} NaN

k6_http_req_duration{expected_response="true",group="::01. MethodName",method="POST",name="http://localhost:8080/api/methodName",project_name="ProjectName",proto="HTTP/1.1",scenario="default",status="200",quantile="0.99"} NaN

k6_http_req_duration_sum{expected_response="true",group="::01. MethodName",method="POST",name="http://localhost:8080/api/methodName",project_name="ProjectName",proto="HTTP/1.1",scenario="default",status="200"} 39.236463000000384

k6_http_req_duration_count{expected_response="true",group="::01. MethodName",method="POST",name="http://localhost:8080/api/methodName",project_name="ProjectName",proto="HTTP/1.1",scenario="default",status="200"} 56138

(note that there is no actual data in metrics with label “quantile”, though tests were launched quite recently)

I can’t quite understand how above metrics correlate with data printed out in console/terminal in the summary table at the end of the tests:

http_req_blocked....................: avg=3.4µs    min=1µs     med=3µs    max=1.59ms   p(90)=5µs     p(95)=7µs    
http_req_connecting.................: avg=1ns      min=0s      med=0s     max=212µs    p(90)=0s      p(95)=0s     
http_req_duration...................: avg=934.1µs  min=224µs   med=773µs  max=585.33ms p(90)=1.38ms  p(95)=1.8ms  
{ expected_response:true }........: avg=934.1µs  min=224µs   med=773µs  max=585.33ms p(90)=1.38ms  p(95)=1.8ms  
http_req_failed.....................: 0.00%    ✓ 0            ✗ 141536
http_req_receiving..................: avg=94.73µs  min=38µs    med=75µs   max=3.44ms   p(90)=149µs   p(95)=200µs  
http_req_sending....................: avg=14.39µs  min=5µs     med=12µs   max=1.19ms   p(90)=20µs    p(95)=26µs   
http_req_tls_handshaking............: avg=0s       min=0s      med=0s     max=0s       p(90)=0s      p(95)=0s     
http_req_waiting....................: avg=824.97µs min=166µs   med=682µs  max=585.2ms  p(90)=1.22ms  p(95)=1.57ms

And how could I correctly lay down these metrics into PromQL…
I’ve tried such queries:

# http_req_duration...................: avg=934.1µs  min=224µs   med=773µs  max=585.33ms p(90)=1.38ms  p(95)=1.8ms
avg(rate(k6_http_req_duration{project_name="$project_name"}[$__rate_interval])) by (group)
min(rate(k6_http_req_duration{project_name="$project_name"}[$__rate_interval])) by (group)
# med:
quantile(0.5, rate(k6_http_req_duration{project_name="$project_name"}[$__rate_interval])) by (group)
max(rate(k6_http_req_duration{project_name="$project_name"}[$__rate_interval])) by (group)
# p(90)
quantile(0.90, rate(k6_http_req_duration{project_name="$project_name"}[$__rate_interval])) by (group)
# p(95)
quantile(0.95, rate(k6_http_req_duration{project_name="$project_name"}[$__rate_interval])) by (group)


# http_req_blocked....................: avg=3.4µs    min=1µs     med=3µs    max=1.59ms   p(90)=5µs     p(95)=7µs
avg(rate(k6_http_req_blocked{project_name="$project_name"}[$__rate_interval])) by (group)
min(rate(k6_http_req_blocked{project_name="$project_name"}[$__rate_interval])) by (group)
# med:
quantile(0.5, rate(k6_http_req_blocked{project_name="$project_name"}[$__rate_interval])) by (group)
max(rate(k6_http_req_blocked{project_name="$project_name"}[$__rate_interval])) by (group)
# p(90)
quantile(0.90, rate(k6_http_req_blocked{project_name="$project_name"}[$__rate_interval])) by (group)
# p(95)
quantile(0.95, rate(k6_http_req_blocked{project_name="$project_name"}[$__rate_interval])) by (group)


# http_req_connecting.................: avg=1ns      min=0s      med=0s     max=212µs    p(90)=0s      p(95)=0s
avg(rate(k6_http_req_connecting{project_name="$project_name"}[$__rate_interval])) by (group)
min(rate(k6_http_req_connecting{project_name="$project_name"}[$__rate_interval])) by (group)
# med:
quantile(0.5, rate(k6_http_req_connecting{project_name="$project_name"}[$__rate_interval])) by (group)
max(rate(k6_http_req_connecting{project_name="$project_name"}[$__rate_interval])) by (group)
# p(90)
quantile(0.90, rate(k6_http_req_connecting{project_name="$project_name"}[$__rate_interval])) by (group)
# p(95)
quantile(0.95, rate(k6_http_req_connecting{project_name="$project_name"}[$__rate_interval])) by (group)


# http_req_tls_handshaking............: avg=0s       min=0s      med=0s     max=0s       p(90)=0s      p(95)=0s
avg(rate(k6_http_req_tls_handshaking{project_name="$project_name"}[$__rate_interval])) by (group)
min(rate(k6_http_req_tls_handshaking{project_name="$project_name"}[$__rate_interval])) by (group)
# med:
quantile(0.5, rate(k6_http_req_tls_handshaking{project_name="$project_name"}[$__rate_interval])) by (group)
max(rate(k6_http_req_tls_handshaking{project_name="$project_name"}[$__rate_interval])) by (group)
# p(90)
quantile(0.90, rate(k6_http_req_tls_handshaking{project_name="$project_name"}[$__rate_interval])) by (group)
# p(95)
quantile(0.95, rate(k6_http_req_tls_handshaking{project_name="$project_name"}[$__rate_interval])) by (group)


# http_req_sending....................: avg=14.39µs  min=5µs     med=12µs   max=1.19ms   p(90)=20µs    p(95)=26µs
avg(rate(k6_http_req_sending{project_name="$project_name"}[$__rate_interval])) by (group)
min(rate(k6_http_req_sending{project_name="$project_name"}[$__rate_interval])) by (group)
# med:
quantile(0.5, rate(k6_http_req_sending{project_name="$project_name"}[$__rate_interval])) by (group)
max(rate(k6_http_req_sending{project_name="$project_name"}[$__rate_interval])) by (group)
# p(90)
quantile(0.90, rate(k6_http_req_sending{project_name="$project_name"}[$__rate_interval])) by (group)
# p(95)
quantile(0.95, rate(k6_http_req_sending{project_name="$project_name"}[$__rate_interval])) by (group)


# http_req_waiting....................: avg=824.97µs min=166µs   med=682µs  max=585.2ms  p(90)=1.22ms  p(95)=1.57ms
avg(rate(k6_http_req_waiting{project_name="$project_name"}[$__rate_interval])) by (group)
min(rate(k6_http_req_waiting{project_name="$project_name"}[$__rate_interval])) by (group)
# med:
quantile(0.5, rate(k6_http_req_waiting{project_name="$project_name"}[$__rate_interval])) by (group)
max(rate(k6_http_req_waiting{project_name="$project_name"}[$__rate_interval])) by (group)
# p(90)
quantile(0.90, rate(k6_http_req_waiting{project_name="$project_name"}[$__rate_interval])) by (group)
# p(95)
quantile(0.95, rate(k6_http_req_waiting{project_name="$project_name"}[$__rate_interval])) by (group)


# http_req_receiving..................: avg=94.73µs  min=38µs    med=75µs   max=3.44ms   p(90)=149µs   p(95)=200µs
avg(rate(k6_http_req_receiving{project_name="$project_name"}[$__rate_interval])) by (group)
min(rate(k6_http_req_receiving{project_name="$project_name"}[$__rate_interval])) by (group)
# med:
quantile(0.5, rate(k6_http_req_receiving{project_name="$project_name"}[$__rate_interval])) by (group)
max(rate(k6_http_req_receiving{project_name="$project_name"}[$__rate_interval])) by (group)
# p(90)
quantile(0.90, rate(k6_http_req_receiving{project_name="$project_name"}[$__rate_interval])) by (group)
# p(95)
quantile(0.95, rate(k6_http_req_receiving{project_name="$project_name"}[$__rate_interval])) by (group)

by (group) - is a must have in our graphs, because we need to see statistics by each method/group/scenario (in our tests group represents each method of the API using this type of tests, where each describe function, well, describes each method call and corresponding checks)

But I’m not sure that those PromQL-queries actually correct…
It’l be great if you could point me to the right direction and to shed light upon these things :slight_smile:

Hi @Crosby,

Can you give us some more info on how you run this setup. For example what options do you use with k6, have you enabeld tags for example by settign the env varible K6_STATSD_ENABLE_TAGS=true.
I managed to follow this setup, with some changes for fedora, and I don’t have quentile but have k6_http_req_duration_bucket with le tags which help with creating a graph


But without more info on your setup I am afraid I can’t really help you much :frowning:

I am running loadtests with such command:

export VUs=1; \
export DURATION=10m; \
export K6_STATSD_ADDR="host:port"; \
export K6_STATSD_ENABLE_TAGS=true; \
k6 run -e BASE_URL="http://host:port/endpoint" \
   --vus $VUs \
   --duration $DURATION \
   --http-debug="full" \
   --summary-export=loadtests/reports/summary_VUs-${VUs}_duration-${DURATION}.json \
   --out statsd \
   --out json=loadtests/reports/report_VUs-${VUs}_duration-${DURATION}.json \
   --out csv=loadtests/reports/report_VUs-${VUs}_duration-${DURATION}.csv \
   loadtests/minimalBaseScenario.js

I don’t have/seeing such metrics…

I was mostly asking about the statsd_exporter and possibly prometheus settings, but I guess I should’ve been more explicit :man_facepalming: , sorry.

From what I can understand:

defaults:
  observer_type: histogram

is the thing that makes the _bucket metrics and you are probably have this as summary. From the doc it seems like you might want to also redefine buckets as you might want better granularity depending on what you are doing. But yeah this seems like it’s a stasd_exporter+prometheus thing, so maybe try their support channels as they likely can help you a lot better :).

I am also interetsted in why you are also outputting to json and csv. Any output has some overhead and having 3 seems a bit excessive? Is there any particular reason you are doign this or are you trying to potentially find some discrepancy :man_shrugging: .

I was mostly asking about the statsd_exporter and possibly prometheus settings, but I guess I should’ve been more explicit :man_facepalming: , sorry.

It’s ok :slight_smile: Thank you for your help and attention/patience! :slight_smile:
I don’t know much of that, but I will try to specify that info with our DevOps-team.

is the thing that makes the _bucket metrics and you are probably have this as summary . From the doc it seems like you might want to also redefine buckets as you might want better granularity depending on what you are doing

Thank you for pointing that out! I will try to investigate further into this direction also.

But yeah this seems like it’s a stasd_exporter+prometheus thing, so maybe try their support channels as they likely can help you a lot better :slight_smile:

Ok, I’ll consider this possibility too, thanks! :slight_smile:

I am also interetsted in why you are also outputting to json and csv. Any output has some overhead and having 3 seems a bit excessive? Is there any particular reason you are doign this or are you trying to potentially find some discrepancy :man_shrugging:

I’m just decided that it would be helpful/necessary to provide full cmd for debugging purposes and not to miss something…
As a QA I’ve got a lot of times when some little details missing could potentially even broke systems :sweat_smile:

As for why I’m using it: well, it’s for debug purposes mainly, nothing else really :slight_smile: