1. 04 Feb, 2021 1 commit
  2. 26 Aug, 2020 1 commit
  3. 21 Aug, 2020 1 commit
  4. 21 Jul, 2020 2 commits
  5. 12 Dec, 2019 1 commit
  6. 15 Mar, 2019 1 commit
  7. 12 Mar, 2019 2 commits
  8. 19 Dec, 2018 1 commit
  9. 25 Sep, 2018 1 commit
  10. 04 Sep, 2018 1 commit
  11. 05 Apr, 2018 1 commit
    • Philippe Laflamme's avatar
      Use common HTTPClientConfig for marathon_sd configuration (#4009) · 2aba238f
      Philippe Laflamme authored
      This adds support for basic authentication which closes #3090
      
      The support for specifying the client timeout was removed as discussed in https://github.com/prometheus/common/pull/123. Marathon was the only sd mechanism doing this and configuring the timeout is done through `Context`.
      
      DC/OS uses a custom `Authorization` header for authenticating. This adds 2 new configuration properties to reflect this.
      
      Existing configuration files that use the bearer token will no longer work. More work is required to make this backwards compatible.
      2aba238f
  12. 31 Mar, 2018 1 commit
  13. 23 Mar, 2018 1 commit
    • Corentin Chary's avatar
      consul: improve consul service discovery (#3814) · 60dafd42
      Corentin Chary authored
      * consul: improve consul service discovery
      
      Related to #3711
      
      - Add the ability to filter by tag and node-meta in an efficient way (`/catalog/services`
        allow filtering by node-meta, and returns a `map[string]string` or `service`->`tags`).
        Tags and nore-meta are also used in `/catalog/service` requests.
      - Do not require a call to the catalog if services are specified by name. This is important
        because on large cluster `/catalog/services` changes all the time.
      - Add `allow_stale` configuration option to do stale reads. Non-stale
        reads can be costly, even more when you are doing them to a remote
        datacenter with 10k+ targets over WAN (which is common for federation).
      - Add `refresh_interval` to minimize the strain on the catalog and on the
        service endpoint. This is needed because of that kind of behavior from
        consul: https://github.com/hashicorp/consul/issues/3712 and because a catalog
        on a large cluster would basically change *all* the time. No need to discover
        targets in 1sec if we scrape them every minute.
      - Added plenty of unit tests.
      
      Benchmarks
      ----------
      
      ```yaml
      scrape_configs:
      
      - job_name: prometheus
        scrape_interval: 60s
        static_configs:
          - targets: ["127.0.0.1:9090"]
      
      - job_name: "observability-by-tag"
        scrape_interval: "60s"
        metrics_path: "/metrics"
        consul_sd_configs:
          - server: consul.service.par.consul.prod.crto.in:8500
            tag: marathon-user-observability  # Used in After
            refresh_interval: 30s             # Used in After+delay
        relabel_configs:
          - source_labels: [__meta_consul_tags]
            regex: ^(.*,)?marathon-user-observability(,.*)?$
            action: keep
      
      - job_name: "observability-by-name"
        scrape_interval: "60s"
        metrics_path: "/metrics"
        consul_sd_configs:
          - server: consul.service.par.consul.prod.crto.in:8500
            services:
              - observability-cerebro
              - observability-portal-web
      
      - job_name: "fake-fake-fake"
        scrape_interval: "15s"
        metrics_path: "/metrics"
        consul_sd_configs:
          - server: consul.service.par.consul.prod.crto.in:8500
            services:
              - fake-fake-fake
      ```
      
      Note: tested with ~1200 services, ~5000 nodes.
      
      | Resource | Empty | Before | After | After + delay |
      | -------- |:-----:|:------:|:-----:|:-------------:|
      |/service-discovery size|5K|85MiB|27k|27k|27k|
      |`go_memstats_heap_objects`|100k|1M|120k|110k|
      |`go_memstats_heap_alloc_bytes`|24MB|150MB|28MB|27MB|
      |`rate(go_memstats_alloc_bytes_total[5m])`|0.2MB/s|28MB/s|2MB/s|0.3MB/s|
      |`rate(process_cpu_seconds_total[5m])`|0.1%|15%|2%|0.01%|
      |`process_open_fds`|16|*1236*|22|22|
      |`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="services"}[5m])`|~0|1|1|*0.03*|
      |`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="service"}[5m])`|0.1|*80*|0.5|0.5|
      |`prometheus_target_sync_length_seconds{quantile="0.9",scrape_job="observability-by-tag"}`|N/A|200ms|0.2ms|0.2ms|
      |Network bandwidth|~10kbps|~2.8Mbps|~1.6Mbps|~10kbps|
      
      Filtering by tag using relabel_configs uses **100kiB and 23kiB/s per service per job** and quite a lot of CPU. Also sends and additional *1Mbps* of traffic to consul.
      Being a little bit smarter about this reduces the overhead quite a lot.
      Limiting the number of `/catalog/services` queries per second almost removes the overhead of service discovery.
      
      * consul: tweak `refresh_interval` behavior
      
      `refresh_interval` now does what is advertised in the documentation,
      there won't be more that one update per `refresh_interval`. It now
      defaults to 30s (which was also the current waitTime in the consul query).
      
      This also make sure we don't wait another 30s if we already waited 29s
      in the blocking call by substracting the number of elapsed seconds.
      
      Hopefully this will do what people expect it does and will be safer
      for existing consul infrastructures.
      60dafd42
  14. 13 Nov, 2017 1 commit
    • Tobias Schmidt's avatar
      Add remote read filter option · 7098c564
      Tobias Schmidt authored
      For special remote read endpoints which have only data for specific
      queries, it is desired to limit the number of queries sent to the
      configured remote read endpoint to reduce latency and performance
      overhead.
      7098c564
  15. 18 Oct, 2017 1 commit
  16. 09 Jul, 2017 1 commit
    • Fuente, Pablo Andres's avatar
      Fixing tests for Windows · 902fafb8
      Fuente, Pablo Andres authored
      Fixing the config/config_test, the discovery/file/file_test and the
      promql/promql_test tests for Windows. For most of the tests, the fix involved
      correct handling of path separators. In the case of the promql tests, the
      issue was related to the removal of the temporal directories used by the
      storage. The issue is that the RemoveAll() call returns an error when it
      tries to remove a directory which is not empty, which seems to be true due to
      some kind of process that is still running after closing the storage. To fix
      it I added some retries to the remove of the temporal directories.
      Adding tags file from Universal Ctags to .gitignore
      902fafb8
  17. 01 Jun, 2017 1 commit
  18. 29 May, 2017 1 commit
  19. 27 Apr, 2017 1 commit
  20. 17 Mar, 2017 1 commit
  21. 20 Feb, 2017 1 commit
  22. 17 Jan, 2017 1 commit
  23. 16 Dec, 2016 1 commit
    • Brian Brazil's avatar
      Add sample_limit to scrape config. · 30448286
      Brian Brazil authored
      This imposes a hard limit on the number of samples ingested from the
      target. This is counted after metric relabelling, to allow dropping of
      problemtic metrics.
      
      This is intended as a very blunt tool to prevent overload due to
      misbehaving targets that suddenly jump in sample count (e.g. adding
      a label containing email addresses).
      
      Add metric to track how often this happens.
      
      Fixes #2137
      30448286
  24. 14 Dec, 2016 1 commit
    • Tristan Colgate-McFarlane's avatar
      Add labeldrop and labelkeep actions. (#2279) · 4d9134e6
      Tristan Colgate-McFarlane authored
      Introduce two new relabel actions. labeldrop, and labelkeep.
      These can be used to filter the set of labels by matching regex
      
      - labeldrop: drops all labels that match the regex
      - labelkeep: drops all labels that do not match the regex
      4d9134e6
  25. 23 Nov, 2016 1 commit
  26. 03 Nov, 2016 2 commits
  27. 19 Oct, 2016 1 commit
  28. 18 Oct, 2016 1 commit
  29. 17 Oct, 2016 2 commits
  30. 07 Oct, 2016 1 commit
  31. 05 Oct, 2016 1 commit
  32. 05 Jul, 2016 1 commit
  33. 08 Jun, 2016 1 commit
  34. 30 May, 2016 1 commit
  35. 07 Apr, 2016 1 commit
  36. 14 Feb, 2016 1 commit