•  
      request #12544 Add http request duration instrumentation
    Infos
    #12544
    Manuel Vacelet (vaceletm)
    2018-12-04 10:47
    2018-11-22 15:36
    13248
    Details
    Add http request duration instrumentation

    We should be able to gather high level metrics about response time of request.

    As it's far from trivial to do that with Prometheus (there are no data type for duration), it should be done with an histogram type (or summary but its more complex, lets start with summary). That means we should define an histogram of possible values (buckets) and then prom will keep track of the number of requests that matched each category.

    Given an histogram set with the following buckets (correspond to microseconds of request duration)

    • 0.05
    • 0.5
    • 1

    I have then 3 requests

    • 250ms
    • 750ms
    • 3s


    I will get the following results in Prometheus

    • 0.05: 0  => no requests took less than 50ms
    • 0.5: 1   => 1 request took less than 500ms
    • 1: 2     => 2 requests took less than 1s
    • +Inf: 3  => All requests took less than +Inf
    • count: 3 => There were 3 requests
    • sum: 4   => The total of all requests took 4s

    +Inf, count and sum are automatically generated.

    Example inspired by https://povilasv.me/prometheus-tracking-request-duration/

    Other useful references:

    Other
    Empty
    Empty
    • [x] enhancement
    • [ ] internal improvement
    Empty
    Stage
    Empty
    Closed
    2018-12-04
    Attachments
    Empty
    References

    Follow-ups

    User avatar
    Thomas Gerbet (tgerbet)2018-11-22 18:07
    gerrit #13178 integrated in Tuleap 10.7.99.66.

    There is however a mistake that I did not catch before merging the contribution, we are only going to get meaningful results for REST calls or requests routed through Fastroute. For other requests we count the hit at the beginning of the request since we cannot do it another way.

    It can be easily filtered out through the labels but it's confusing to expose incorrect data and then force the user to know what can be used and to do our cleanup.