Cloudsmith: add v2 bandwidth analytics and repository metrics#2917
Conversation
… plugin when ratelimit hits
…ion-has-stopped-receiving-metrics
…integration-has-stopped-receiving-metrics fix(CENG-719): Improve error handling around rate limits
git-thuerk-done
left a comment
There was a problem hiding this comment.
Hi @BartoszBlizniak just a couple of small corrections for the change log (typos and British spelling corrections), let me know when this is ready for re-review
Co-authored-by: Alicia Thuerk <alicia.thuerk@datadoghq.com>
Co-authored-by: Alicia Thuerk <alicia.thuerk@datadoghq.com>
Co-authored-by: Alicia Thuerk <alicia.thuerk@datadoghq.com>
Co-authored-by: Alicia Thuerk <alicia.thuerk@datadoghq.com>
Co-authored-by: Alicia Thuerk <alicia.thuerk@datadoghq.com>
…-datadog' of github.com:BartoszBlizniak/integrations-extras into ceng-726-create-metric-for-consumption-by-repository-in-datadog
| # Dedup: only submit if this timestamp is newer than the last one we submitted. | ||
| last_submitted = self._profile_last_ts.get(dedup_key, 0) | ||
| if latest_ts_epoch <= last_submitted: | ||
| self.log.debug("No new data for %s; submitting zero.", label) |
There was a problem hiding this comment.
question/request
I find it odd/misleading to submit 0 when there is no new data.
Why would this be better than skipping the sumbission?
There was a problem hiding this comment.
see below comment/response
|
|
||
| if not self._should_poll_analytics(dedup_key): | ||
| # Keep metric continuity while avoiding unnecessary API calls. | ||
| self.gauge(metric_name, 0.0, tags=self.tags) |
There was a problem hiding this comment.
question/request
I find it odd/misleading to submit 0 in between polls.
Why would this be better than skipping the submission?
There was a problem hiding this comment.
During my testing, I've observed that when a gauge is used, if no new data is present within, let's say, a 5-minute timeframe, the candles in the graph will keep reporting the last known value for the next 4-5 subsequent checks, which was throwing off the actual reporting in the given interval. The metrics API takes some time to properly propagate so I'm allowing some extra buffer time before submission to ensure that the value being reported matches closely with what we see in the UI usage logs. I was a bit confused as to what metric could be best here but this is the only way I figured in which the reporting was accurate.
For background, we report back:
{
"filters": {
"start_time": "2026-02-27T14:07:50Z",
"repository": [
{
"name": "testing-private",
"repository_type": "PRIVATE",
"slug": "testing-private",
}
],
"aggregate": "BYTES_DOWNLOADED_SUM",
"interval": "MINUTE"
},
"results": [
{
"dimensions": {
"aggregate": "BYTES_DOWNLOADED_SUM",
"unit": "bytes"
},
"timestamps": [
"2026-02-27T14:16:00Z",
"2026-02-27T14:19:00Z",
"2026-02-27T14:54:00Z",
"2026-02-27T14:56:00Z"
],
"values": [570989513, 3090206893, 1027, 3090206893]
}
]
}
The values are the total number of bytes downloaded over a given time interval. We only want to upload the data once it's fully settled and only report in the Datadog entry, which has been submitted. Without submitting a 0, the last known value would be repeatedly inserted for next few checks.
For reference this is how it looks like in our UI dashboard:

Do you have any recommendations on how to best tackle this?
There was a problem hiding this comment.
I see.
The interpolation/filling issue with missing data can be controlled from the Datadog UI.
You can apply a .fill(zero) or .fill(null) modifier to suppress interpolation. By default, gauges are displayed with a linear interpolation for a maximum of 5 minutes.
I think it would be better to not submit the 0 metrics in between polls. That way the user can configure how they would like the missing metrics to be displayed (zeros or null).
You can modify the dashboard widgets to utilize the .fill modifier.
There was a problem hiding this comment.
Thanks @dkirov-dd - I'll pick this up on Monday again
There was a problem hiding this comment.
Thank you once again @dkirov-dd for the feedback. I think you’re right. After looking at it more closely, the 0 we submit here does not actually mean “zero downloads in a completed interval” in most cases. It usually means “no new settled bucket yet” because the analytics API is delayed, the latest bucket is still incomplete, or we intentionally skipped polling/deduped the same timestamp.
The interpolation issue is still real, but that seems better handled at the Datadog query/widget layer with .fill(null) or .fill(zero) depending on what the user wants to see.
So I’m leaning toward keeping the settle/dedup logic, but skipping metric submission when there is no new completed bucket rather than emitting 0.
There was a problem hiding this comment.
Hey @BartoszBlizniak,
That sounds good to me! 👌
| # Submit realtime bandwidth gauge if present | ||
| if self.enable_realtime_bandwidth and realtime_metrics.get("bandwidth_bytes_interval") is not None: | ||
| self.gauge( | ||
| "cloudsmith.bandwidth_bytes_interval", |
There was a problem hiding this comment.
request
The bandwidth_bytes_interval metric is marked as deprecated in the metadata.csv but is no longer being collected at all.
Could you add back the logic for its collection?
| cloudsmith.analytics.bytes_downloaded_sum,gauge,,byte,,"Total bytes downloaded in the configured analytics interval",0,cloudsmith,analytics_bytes_downloaded_sum,,profile | ||
| cloudsmith.analytics.request_count,gauge,,item,,"Total download request count in the configured analytics interval",0,cloudsmith,analytics_request_count,,profile |
There was a problem hiding this comment.
| cloudsmith.analytics.bytes_downloaded_sum,gauge,,byte,,"Total bytes downloaded in the configured analytics interval",0,cloudsmith,analytics_bytes_downloaded_sum,,profile | |
| cloudsmith.analytics.request_count,gauge,,item,,"Total download request count in the configured analytics interval",0,cloudsmith,analytics_request_count,,profile | |
| cloudsmith.analytics.bytes_downloaded_sum,gauge,,byte,,"Total bytes downloaded in the configured bandwidth interval",0,cloudsmith,analytics_bytes_downloaded_sum,,profile | |
| cloudsmith.analytics.request_count,gauge,,item,,"Total download request count in the configured bandwidth interval",0,cloudsmith,analytics_request_count,,profile |
Not sure if this is correct or not.
There is no analytics interval in the integration config.
There was a problem hiding this comment.
You’re right, there isn’t a separate analytics_interval config in the integration. Updated the metadata text to say “configured collection interval” instead.
| cloudsmith.token_count,gauge,,item,,"The number of tokens in an organization",0,cloudsmith,token_count,, | ||
| cloudsmith.token_bandwidth_total,gauge,,byte,,"The total bandwidth used by tokens",0,cloudsmith,token_bandwidth_total,, | ||
| cloudsmith.token_download_total,gauge,,item,,"The total downloads used by tokens",0,cloudsmith,token_download_total,, |
There was a problem hiding this comment.
request
These metrics (cloudsmith.token_count, cloudsmith.token_bandwidth_total and cloudsmith.token_download_total) should continue being submitted even though we are deprecating them.
An update of the Cloudsmith integration would otherwise break dashboards for existing users.
P.S. don't forget to add them back in the tests too
There was a problem hiding this comment.
My bad here, added those back. Just to confirm, we will be looking to deprecate these endpoints very soon, given we are setting warnings on this release, is it okay to fully remove those metrics in the following release? We will prepare notices on our end and send out notice to our customers about the change as well.
There was a problem hiding this comment.
Thanks for adding the metrics back!
To answer your question, we prefer only deleting metrics when absolutely necessary.
We usually keep old metrics available under 'legacy' configuration options that users can enable if they are running an older version of the technology they want to monitor.
That being said, I'm not entirely up to date on Cloudsmith's offerings.
As far as I understand Cloudsmith is cloud only, i.e. when you deprecate the endpoints exposing the above metrics, this change will eventually affect all Cloudsmith users.
In that case I agree that the metrics could be removed in the future.
I would be against it if there were self-hosted versions of Cloudsmith which would allow users to be running different versions.
There was a problem hiding this comment.
Perfect - and correct, we are fully cloud-native (no on-prem at all) so this will end up affceting our users. I don't have concrete timelines yet, but it's on the horizon as we're slowly sunsetting our legacy webapp to new stack which is using the new endpoints.
Perhaps towards the end of the year, I will be looking at re-creating the integration from the ground up as my team has officially taken over this project, or at a bare minimum, modularize it a bit to make contributions and reviews easier - for you folks and ourselves.
Appriciate the feedback and help 🙏
What does this PR do?
It introduces a new bandwidth analytics pipeline using the Cloudsmith v2 analytics time-series API, replacing the previous entitlements-based approach, and adds repository-level observability improvements.
This includes:
Org-wide realtime bandwidth monitoring
Controlled via
enable_realtime_bandwidth(defaulttrue). Fetches total bytes downloaded and request count for the entire organization with no filters. Emits:cloudsmith.bandwidth.bytes_downloadedcloudsmith.bandwidth.request_countProfile-based bandwidth monitoring
Users can define named
bandwidth_profilesin their config, each with its ownaggregate(bytes_downloaded_sumorrequest_count) and granular filters such as:repositorypackage_formatuserentitlement_tokenip_addresshttp_statuscountryEach profile emits
cloudsmith.analytics.bytes_downloaded_sumorcloudsmith.analytics.request_counttagged withprofile:<name>plus all configured filter tags.Repository-level observability
Adds repository data collection from Cloudsmith
/repos/{owner}/with pagination handling and emits:cloudsmith.repository.storage_bytescloudsmith.repository.package_countcloudsmith.repository.download_countDashboard updates
Adds:
Quota fix
Corrects slight margin errors in quota endpoint conversions.
Tests and docs updates
Adds unit test coverage for:
Also updates:
Motivation
Cloudsmith users need both broader bandwidth observability and more granular attribution of usage.
The previous bandwidth implementation relied on the entitlements endpoint, which had limited filtering capability and did not support the level of granularity customers need. The Cloudsmith v2 analytics time-series API provides per-interval bucketed data with rich filtering dimensions such as repository, package format, user, entitlement token, IP address, and HTTP status, enabling customers to:
In addition, Cloudsmith users needed per-repository visibility into storage utilization and operational counters. The repository endpoint integration adds direct repository-level visibility for storage, package count, and download count, and makes those metrics explorable in Datadog dashboards using
repository:<slug>tags.Review checklist
no-changeloglabel attachedAdditional Notes
Metric type:
gaugeThe integration submits the latest completed bucket value as a
gauge. Each API data point represents bytes downloaded or request count in one interval bucket, which is treated as a point-in-time measurement for that window. The dedup logic ensures only one value is submitted per bucket, and explicit zeros are emitted when no new data is available so dashboards do not interpolate stale values.New metrics
cloudsmith.bandwidth.bytes_downloadedcloudsmith.bandwidth.request_countcloudsmith.analytics.bytes_downloaded_sumcloudsmith.analytics.request_countcloudsmith.repository.storage_bytescloudsmith.repository.package_countcloudsmith.repository.download_countConfiguration example