Skip to content

Cloudsmith: add v2 bandwidth analytics and repository metrics#2917

Merged
davidfeng-datadog merged 34 commits intoDataDog:masterfrom
BartoszBlizniak:ceng-726-create-metric-for-consumption-by-repository-in-datadog
Mar 25, 2026
Merged

Cloudsmith: add v2 bandwidth analytics and repository metrics#2917
davidfeng-datadog merged 34 commits intoDataDog:masterfrom
BartoszBlizniak:ceng-726-create-metric-for-consumption-by-repository-in-datadog

Conversation

@BartoszBlizniak
Copy link
Copy Markdown
Contributor

@BartoszBlizniak BartoszBlizniak commented Mar 3, 2026

What does this PR do?

It introduces a new bandwidth analytics pipeline using the Cloudsmith v2 analytics time-series API, replacing the previous entitlements-based approach, and adds repository-level observability improvements.

This includes:

  1. Org-wide realtime bandwidth monitoring
    Controlled via enable_realtime_bandwidth (default true). Fetches total bytes downloaded and request count for the entire organization with no filters. Emits:

    • cloudsmith.bandwidth.bytes_downloaded
    • cloudsmith.bandwidth.request_count
  2. Profile-based bandwidth monitoring
    Users can define named bandwidth_profiles in their config, each with its own aggregate (bytes_downloaded_sum or request_count) and granular filters such as:

    • repository
    • package_format
    • user
    • entitlement_token
    • ip_address
    • http_status
    • country

    Each profile emits cloudsmith.analytics.bytes_downloaded_sum or cloudsmith.analytics.request_count tagged with profile:<name> plus all configured filter tags.

  3. Repository-level observability
    Adds repository data collection from Cloudsmith /repos/{owner}/ with pagination handling and emits:

    • cloudsmith.repository.storage_bytes
    • cloudsmith.repository.package_count
    • cloudsmith.repository.download_count
  4. Dashboard updates
    Adds:

    • Org Bandwidth Overview with query-value and timeseries widgets for org-wide metrics
    • Profile-based widgets for drill-down by profile
    • Repository Overview with selection query values and toplists for storage, package count, and download usage
  5. Quota fix
    Corrects slight margin errors in quota endpoint conversions.

  6. Tests and docs updates
    Adds unit test coverage for:

    • bandwidth profile behavior
    • repository pagination/parsing
    • repository metric submission

    Also updates:

    • configuration spec
    • example config
    • README
    • metadata
    • changelog
image image

Motivation

Cloudsmith users need both broader bandwidth observability and more granular attribution of usage.

The previous bandwidth implementation relied on the entitlements endpoint, which had limited filtering capability and did not support the level of granularity customers need. The Cloudsmith v2 analytics time-series API provides per-interval bucketed data with rich filtering dimensions such as repository, package format, user, entitlement token, IP address, and HTTP status, enabling customers to:

  • Monitor bandwidth consumption at the org level with zero configuration
  • Create targeted profiles to track specific repos, package formats, or user segments
  • Set alerts on bandwidth spikes or unusual download patterns for specific scopes
  • Attribute bandwidth costs to specific teams, tokens, or repositories

In addition, Cloudsmith users needed per-repository visibility into storage utilization and operational counters. The repository endpoint integration adds direct repository-level visibility for storage, package count, and download count, and makes those metrics explorable in Datadog dashboards using repository:<slug> tags.

Review checklist

  • PR has a meaningful title or PR has the no-changelog label attached
  • Feature or bugfix has tests
  • Git history is clean
  • If PR impacts documentation, docs team has been notified or an issue has been opened on the documentation repo
  • If this PR includes a log pipeline, please add a description describing the remappers and processors.

Additional Notes

Metric type: gauge
The integration submits the latest completed bucket value as a gauge. Each API data point represents bytes downloaded or request count in one interval bucket, which is treated as a point-in-time measurement for that window. The dedup logic ensures only one value is submitted per bucket, and explicit zeros are emitted when no new data is available so dashboards do not interpolate stale values.

New metrics

Metric Type Description
cloudsmith.bandwidth.bytes_downloaded gauge Org-wide total bytes downloaded per interval (no filters)
cloudsmith.bandwidth.request_count gauge Org-wide total request count per interval (no filters)
cloudsmith.analytics.bytes_downloaded_sum gauge Per-profile bytes downloaded per interval (with filter tags)
cloudsmith.analytics.request_count gauge Per-profile request count per interval (with filter tags)
cloudsmith.repository.storage_bytes gauge Current storage usage for a repository in bytes
cloudsmith.repository.package_count gauge Number of packages in the repository
cloudsmith.repository.download_count gauge Total package downloads for the repository

Configuration example

enable_realtime_bandwidth: true
bandwidth_interval: five_minutes
bandwidth_profiles:
  - name: prod-python
    aggregate: bytes_downloaded_sum
    repository:
      - production
    package_format:
      - python
    entitlement_token:
      - e2e

@BartoszBlizniak BartoszBlizniak marked this pull request as ready for review March 3, 2026 16:24
@BartoszBlizniak BartoszBlizniak requested review from a team as code owners March 3, 2026 16:24
@rtrieu rtrieu assigned rtrieu and unassigned rtrieu Mar 3, 2026
Copy link
Copy Markdown
Contributor

@git-thuerk-done git-thuerk-done left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @BartoszBlizniak just a couple of small corrections for the change log (typos and British spelling corrections), let me know when this is ready for re-review

Comment thread cloudsmith/assets/configuration/spec.yaml Outdated
Comment thread cloudsmith/datadog_checks/cloudsmith/data/conf.yaml.example Outdated
Comment thread cloudsmith/CHANGELOG.md Outdated
Comment thread cloudsmith/CHANGELOG.md Outdated
Comment thread cloudsmith/CHANGELOG.md Outdated
BartoszBlizniak and others added 8 commits March 4, 2026 09:43
Co-authored-by: Alicia Thuerk <alicia.thuerk@datadoghq.com>
Co-authored-by: Alicia Thuerk <alicia.thuerk@datadoghq.com>
Co-authored-by: Alicia Thuerk <alicia.thuerk@datadoghq.com>
Co-authored-by: Alicia Thuerk <alicia.thuerk@datadoghq.com>
Co-authored-by: Alicia Thuerk <alicia.thuerk@datadoghq.com>
…-datadog' of github.com:BartoszBlizniak/integrations-extras into ceng-726-create-metric-for-consumption-by-repository-in-datadog
Comment thread cloudsmith/datadog_checks/cloudsmith/check.py Outdated
# Dedup: only submit if this timestamp is newer than the last one we submitted.
last_submitted = self._profile_last_ts.get(dedup_key, 0)
if latest_ts_epoch <= last_submitted:
self.log.debug("No new data for %s; submitting zero.", label)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question/request
I find it odd/misleading to submit 0 when there is no new data.
Why would this be better than skipping the sumbission?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see below comment/response


if not self._should_poll_analytics(dedup_key):
# Keep metric continuity while avoiding unnecessary API calls.
self.gauge(metric_name, 0.0, tags=self.tags)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question/request
I find it odd/misleading to submit 0 in between polls.
Why would this be better than skipping the submission?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During my testing, I've observed that when a gauge is used, if no new data is present within, let's say, a 5-minute timeframe, the candles in the graph will keep reporting the last known value for the next 4-5 subsequent checks, which was throwing off the actual reporting in the given interval. The metrics API takes some time to properly propagate so I'm allowing some extra buffer time before submission to ensure that the value being reported matches closely with what we see in the UI usage logs. I was a bit confused as to what metric could be best here but this is the only way I figured in which the reporting was accurate.

For background, we report back:

{
  "filters": {
    "start_time": "2026-02-27T14:07:50Z",
    "repository": [
      {
        "name": "testing-private",
        "repository_type": "PRIVATE",
        "slug": "testing-private",
      }
    ],
    "aggregate": "BYTES_DOWNLOADED_SUM",
    "interval": "MINUTE"
  },
  "results": [
    {
      "dimensions": {
        "aggregate": "BYTES_DOWNLOADED_SUM",
        "unit": "bytes"
      },
      "timestamps": [
        "2026-02-27T14:16:00Z",
        "2026-02-27T14:19:00Z",
        "2026-02-27T14:54:00Z",
        "2026-02-27T14:56:00Z"
      ],
      "values": [570989513, 3090206893, 1027, 3090206893]
    }
  ]
}

The values are the total number of bytes downloaded over a given time interval. We only want to upload the data once it's fully settled and only report in the Datadog entry, which has been submitted. Without submitting a 0, the last known value would be repeatedly inserted for next few checks.

For reference this is how it looks like in our UI dashboard:
image

Do you have any recommendations on how to best tackle this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.
The interpolation/filling issue with missing data can be controlled from the Datadog UI.
You can apply a .fill(zero) or .fill(null) modifier to suppress interpolation. By default, gauges are displayed with a linear interpolation for a maximum of 5 minutes.

I think it would be better to not submit the 0 metrics in between polls. That way the user can configure how they would like the missing metrics to be displayed (zeros or null).
You can modify the dashboard widgets to utilize the .fill modifier.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dkirov-dd - I'll pick this up on Monday again

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you once again @dkirov-dd for the feedback. I think you’re right. After looking at it more closely, the 0 we submit here does not actually mean “zero downloads in a completed interval” in most cases. It usually means “no new settled bucket yet” because the analytics API is delayed, the latest bucket is still incomplete, or we intentionally skipped polling/deduped the same timestamp.

The interpolation issue is still real, but that seems better handled at the Datadog query/widget layer with .fill(null) or .fill(zero) depending on what the user wants to see.

So I’m leaning toward keeping the settle/dedup logic, but skipping metric submission when there is no new completed bucket rather than emitting 0.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @BartoszBlizniak,
That sounds good to me! 👌

# Submit realtime bandwidth gauge if present
if self.enable_realtime_bandwidth and realtime_metrics.get("bandwidth_bytes_interval") is not None:
self.gauge(
"cloudsmith.bandwidth_bytes_interval",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request
The bandwidth_bytes_interval metric is marked as deprecated in the metadata.csv but is no longer being collected at all.
Could you add back the logic for its collection?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding

Comment thread cloudsmith/metadata.csv Outdated
Comment on lines +33 to +34
cloudsmith.analytics.bytes_downloaded_sum,gauge,,byte,,"Total bytes downloaded in the configured analytics interval",0,cloudsmith,analytics_bytes_downloaded_sum,,profile
cloudsmith.analytics.request_count,gauge,,item,,"Total download request count in the configured analytics interval",0,cloudsmith,analytics_request_count,,profile
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cloudsmith.analytics.bytes_downloaded_sum,gauge,,byte,,"Total bytes downloaded in the configured analytics interval",0,cloudsmith,analytics_bytes_downloaded_sum,,profile
cloudsmith.analytics.request_count,gauge,,item,,"Total download request count in the configured analytics interval",0,cloudsmith,analytics_request_count,,profile
cloudsmith.analytics.bytes_downloaded_sum,gauge,,byte,,"Total bytes downloaded in the configured bandwidth interval",0,cloudsmith,analytics_bytes_downloaded_sum,,profile
cloudsmith.analytics.request_count,gauge,,item,,"Total download request count in the configured bandwidth interval",0,cloudsmith,analytics_request_count,,profile

Not sure if this is correct or not.
There is no analytics interval in the integration config.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re right, there isn’t a separate analytics_interval config in the integration. Updated the metadata text to say “configured collection interval” instead.

Comment thread cloudsmith/metadata.csv
Comment on lines -4 to -6
cloudsmith.token_count,gauge,,item,,"The number of tokens in an organization",0,cloudsmith,token_count,,
cloudsmith.token_bandwidth_total,gauge,,byte,,"The total bandwidth used by tokens",0,cloudsmith,token_bandwidth_total,,
cloudsmith.token_download_total,gauge,,item,,"The total downloads used by tokens",0,cloudsmith,token_download_total,,
Copy link
Copy Markdown
Contributor

@dkirov-dd dkirov-dd Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request
These metrics (cloudsmith.token_count, cloudsmith.token_bandwidth_total and cloudsmith.token_download_total) should continue being submitted even though we are deprecating them.
An update of the Cloudsmith integration would otherwise break dashboards for existing users.

P.S. don't forget to add them back in the tests too

Copy link
Copy Markdown
Contributor Author

@BartoszBlizniak BartoszBlizniak Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad here, added those back. Just to confirm, we will be looking to deprecate these endpoints very soon, given we are setting warnings on this release, is it okay to fully remove those metrics in the following release? We will prepare notices on our end and send out notice to our customers about the change as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the metrics back!

To answer your question, we prefer only deleting metrics when absolutely necessary.
We usually keep old metrics available under 'legacy' configuration options that users can enable if they are running an older version of the technology they want to monitor.

That being said, I'm not entirely up to date on Cloudsmith's offerings.
As far as I understand Cloudsmith is cloud only, i.e. when you deprecate the endpoints exposing the above metrics, this change will eventually affect all Cloudsmith users.
In that case I agree that the metrics could be removed in the future.

I would be against it if there were self-hosted versions of Cloudsmith which would allow users to be running different versions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect - and correct, we are fully cloud-native (no on-prem at all) so this will end up affceting our users. I don't have concrete timelines yet, but it's on the horizon as we're slowly sunsetting our legacy webapp to new stack which is using the new endpoints.

Perhaps towards the end of the year, I will be looking at re-creating the integration from the ground up as my team has officially taken over this project, or at a bare minimum, modularize it a bit to make contributions and reviews easier - for you folks and ourselves.

Appriciate the feedback and help 🙏

Copy link
Copy Markdown
Contributor

@davidfeng-datadog davidfeng-datadog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@davidfeng-datadog davidfeng-datadog added this pull request to the merge queue Mar 25, 2026
Merged via the queue into DataDog:master with commit 6455bcc Mar 25, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants