Skip to content

chore: add datadog-agent-config crate#56

Merged
duncanpharvey merged 113 commits intomainfrom
duncan-harvey/config-crate
Mar 11, 2026
Merged

chore: add datadog-agent-config crate#56
duncanpharvey merged 113 commits intomainfrom
duncan-harvey/config-crate

Conversation

@duncanpharvey
Copy link
Collaborator

@duncanpharvey duncanpharvey commented Dec 16, 2025

What does this PR do?

  • Upstreams config crate from datadog-lambda-extension to serverless-components with commit history.
  • Adds Cargo.toml for crate
  • Updates 3rd party licenses

Motivation

Create a shared config crate that can be used for AWS, Azure, and GCP.

https://datadoghq.atlassian.net/browse/SVLS-5564

Additional Notes

Commands to upstream crate from datadog-lambda-extension with commit history.

cd datadog-lambda-extension
git subtree split -P bottlecap/src/config -b duncan-harvey/split-config
git checkout duncan-harvey/split-config
git push

cd serverless-components
git checkout -b duncan-harvey/config-crate
git subtree add -P crates/datadog-agent-config git@github.com:DataDog/datadog-lambda-extension.git duncan-harvey/split-config

Planned follow up PRs:

  • Add feature to datadog-agent-config crate for lambda specific functionality
  • Consolidate config for Serverless Compatibility Layer into datadog-agent-config crate, gating any Azure/GCP specific functionality behind a feature
    • Currently the config is spread across datadog-serverless-compat, dogstatsd and datadog-trace-agent crates.

Describe how to test/QA your changes

Automated tests. Not wired into Bottlecap and Serverless Compatibility Layer yet.

duncanista and others added 30 commits May 13, 2024 14:07
* remove `config.rs` file

* create `config/mod.rs`

* move to `config/flush_strategy.rs`

* move to `config/log_level.rs`

* update imports

* fmt
* add logs processing rules field

* add `regex` crate

* add `processing_rules.rs` config module

* use `processing_rule` module instead

* update logs `processor` to use compiled rules

* update unit test
* add plumbing for aws secret manager

* strip as much deps as possible

* fix test

* remove unused warning

* reorg runner for bottlecap

* fix overwriting of arch

* add full error to the panic

* avoid building the go agent all the time

* rename module

* speed up build

* add simple scripts to build and publish

* remove deleted call

* remove changes from common scripts

* resolve import conflicts

* wrong file pushed

* make sure permissions are right

* move secret parsing after log activation

* add some stat to build

* add manual req for secret (still broken)

* rebuild after conflict on cargo loc

* automate update and call

* change headers and fix signature

* fix typo and small refactor

* remove useless thread spawn

* small refactors on deploy scripts

* use access key always for signatures

* the secret has to be used to sign

* fix: missing newline in request

* use only manual decrypt

* add timed steps

* add scripts to force restarts

* fix launch script

* refactor decrypt

* cargo format and clippy

* fix clippy error

add formatting/clippy functinos

---------

Co-authored-by: AJ Stuyvenberg <astuyve@gmail.com>
* add kms handling

* fix return value

* fix test

* fix kms

* remove committed test file

* rename

* format

* fmt after fix

* fix conflicts

* await async stuff

* formatting

* bubble up error converting to sdt

* use box dyn for generic errors

* reforamt

* address other comments

* remove old build file added with conflict
* add kms handling

* fix return value

* fix test

* fix kms

* remove committed test file

* rename

* format

* fmt after fix

* fix conflicts

* await async stuff

* formatting

* bubble up error converting to sdt

* use box dyn for generic errors

* reforamt

* address other comments

* remove old build file added with conflict

* do not pass around the whole config for just the secret

* fix scope and just bubble up erros

* reformat

* renaming

* without api key, just call next loop

* fix types and format

* fix folder path

* fix cd and returns

* resolve conflicts

* formatter
* print failover reason as json string

* fmt

* update key to be more verbose
* wip: tracing

* feat: tracing WIP

* feat: rename mini agent to trace agent

* feat: fmt

* feat: Fix formatting after rename

* fix: remove extra tokio task

* feat: allow tracing

* feat: working v5 traces

* feat: Update to use my branch of libdatadog so we have v5 support

* feat: Update w/ libdatadog to pass trace encoding version

* feat: update w/ merged libdatadog changes

* feat: Refactor trace agent, reduce code duplication, enum for trace version. Pass trace provider. Manual stats flushing. Custom create endpoint until we clean up that code in libdatadog.

* feat: Unify config, remove trace config. Tests pass

* feat: fmt

* feat: fmt

* clippy fixes

* parse time

* feat: clippy again

* feat: revert dockerfile

* feat: no-default-features

* feat: Remove utils, take only what we need

* feat: fmt moves the import

* feat: replace info with debug. Replace log with tracing lib

* feat: more debug

* feat: Remove call to trace utils
…for the runtime proxy (#296)

* feat: Allow appsec but in a disabled-only state until we add support for the runtime proxy

* feat: Log failover reason

* fix: serverless_appsec_enabled. Also log the reason
* feat: Require DD_EXTENSION_VERSION: next

* feat: add tests, fix metric tests

* feat: revert metrics test byte changes

* feat: fmt

* feat: remove ref
* feat: honor enhanced metrics bool

* feat: add test

* feat: refactor to log instead of return result

* fix: clippy
* fallback on `datadog.yaml` presence

* add comment
* remove `tracing-log`

instead, use the `tracing-subscriber` `tracing-log` feature

* capitalize debugs

* remove unnecessary file

* update log formatter prefix

* update log filter

* fmt
* feat: race flush

* refactor: periodic only when configured

* fmt

* when flushing strategy is default, set periodic flush tick to `1s`

* on `End`, never flush until the end of the invocation

* remove `tokio_unstable` feature for building

* remove debug comment

* remove `invocation_times` mod

* update `flush_control.rs`

* use `flush_control` in main

* allow `end,<ms>` strategy

allows to flush periodically over a given amount of seconds and at the end

* update `debug` comment for flushing

* simplify logic for flush strategy parsing

* remove log that could spam debug

* refactor code and add unit test

---------

Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com>
Co-authored-by: alexgallotta <5581237+alexgallotta@users.noreply.github.com>
* test: add invalid string and multi line distro test with empty newline

* test: move unit test to appropriate package

* fix: do not error log for empty and new line strings

---------

Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com>
* feat: Allow trace disabled plugins

* feat: trace debug
* feat: Allowlist additional env vars

* fix: fmt

* feat: and repo url
* fix: allow objects to be ignored

* feat: specs
* set explicit deny list

also allow `datadog.yaml` usage

* add unit test for parsing rule from yaml

* remove `object_ignore.rs`

* remove import

* remove logging failover reason when user is not opt-in
* failover fast

* typo

* failover on `/opt/datadog_wrapper` set
* feat: serde's rename_all isn't working, use a custom deserializer to lowercase loglevels

* feat: default is warn

* feat: Allow reptition to clear up imports

* feat: rebase
* feat: support DD_HTTP_PROXY and DD_HTTPS_PROXY

* fix: remove import

* fix: fmt

* feat: Revert fqdn changes to enable testing

* feat: Use let instead of repeated instantiation

* feat: Rip out proxy stuff we dont need but make sure we dont proxy the telemetry or runtime APIs with system proxies

* feat: remove debug

* fix: no debugs for hyper/h2

* fix: revert cargo changes

* feat: Pin libdatadog deps to v13.1

* fix: rebase with dogstatsd 13.1

* fix: use main for dsdrs

* fix: remove unwrap

* fix: fmt

* fix: licenses

* increase size boo

* fix: size ugh

* fix: install_default() in tests
* feat: Honor priority order of DD_PROXY_HTTPS over HTTPS_PROXY

* feat: fmt

* fix: Prefer Ok over some + ok

* Feat: Use tags for proxy support in libdatadog

* fix: no proxy for tests

* fix: license

* all this for a comma
This reverts commit 9560657582f2f22c8e68af5d0bb9d7d2b0765650.
litianningdatadog and others added 9 commits December 4, 2025 19:27
https://datadoghq.atlassian.net/browse/SVLS-8095

## Overview
Tag parsing previously used split(':') which broke values containing colons like URLs (git.repository_url:https://...). Changed to usesplitn(2, ':') to split only on the first colon, preserving the rest as the value.

Changes:
 - Add parse_key_value_tag() helper to centralize parsing logic
 - Refactor deserialize_key_value_pairs to use helper
 - Refactor deserialize_key_value_pair_array_to_hashmap to use helper
 - Add comprehensive test coverage for URL values and edge cases

## Testing 
unit test and expect e2e tests to pass

Co-authored-by: tianning.li <tianning.li@datadoghq.com>
## Problem
A customer reported that their Lambda is behind a proxy, and the
Rust-based extension can't send traces to Datadog via the proxy, while
the previous go-based extension worked.

## This PR
Supports the env var `DD_TLS_CERT_FILE`: The path to a file of
concatenated CA certificates in PEM format.
Example: `DD_TLS_CERT_FILE=/opt/ca-cert.pem`, so the when the extension
flushes traces/stats to Datadog, the HTTP client created can load and
use this cert, and connect the proxy properly.

## Testing
### Steps
1. Create a Lambda in a VPC with an NGINX proxy.
2. Add a layer to the Lambda, which includes the CA certificate
`ca-cert.pem`
3. Set env vars:
    - `DD_TLS_CERT_FILE=/opt/ca-cert.pem`
- `DD_PROXY_HTTPS=http://10.0.0.30:3128`, where `10.0.0.30` is the
private IP of the proxy EC2 instance
    - `DD_LOG_LEVEL=debug`
4. Update routing rules of security groups so the Lambda can reach
`http://10.0.0.30:3128`
5. Invoke the Lambda
### Result
**Before**
Trace flush failed with error logs:
> DD_EXTENSION | ERROR | Max retries exceeded, returning request error
error=Network error: client error (Connect) attempts=1
DD_EXTENSION | ERROR | TRACES | Request failed: No requests sent

**After**
Trace flush is successful:
> DD_EXTENSION | DEBUG | TRACES | Flushing 1 traces
DD_EXTENSION | DEBUG | TRACES | Added root certificate from
/opt/ca-cert.pem
DD_EXTENSION | DEBUG | TRACES | Proxy connector created with proxy:
Some("http://10.0.0.30:3128")
DD_EXTENSION | DEBUG | Sending with retry
url=https://trace.agent.datadoghq.com/api/v0.2/traces payload_size=1120
max_retries=1
DD_EXTENSION | DEBUG | Received response status=202 Accepted attempt=1
DD_EXTENSION | DEBUG | Request succeeded status=202 Accepted attempts=1
DD_EXTENSION | DEBUG | TRACES | Flushing took 1609 ms

## Notes
This fix only covers trace flusher and stats flusher, which use
`ServerlessTraceFlusher::get_http_client()` to create the HTTP client.
It doesn't cover logs flusher and proxy flusher, which use a different
function (http.rs:get_client()) to create the HTTP client. However, logs
flushing was successful in my tests, even if no certificate was added.
We can come back to logs/proxy flusher if someone reports an error.
## Overview
The crate `datadog-trace-obfuscation` has been renamed as
`libdd-trace-obfuscation`. This PR updates this dependency.

## Testing
## Problem
Span dedup service sometimes fails to return the result and thus logs
the error:
> DD_EXTENSION | ERROR | Failed to send check_and_add response: true

I see this error in our Self Monitoring and a customer's account.
Also I believe it causes extension to fail to receive traces from the
tracer, causing missing traces. This is because the caller of span dedup
is in `process_traces()`, which is the function that handles the
tracer's HTTP request to send traces. If this function fails to get span
dedup result and gets stuck, the HTTP request will time out.

## This PR
While I don't yet know what causes the error, this PR adds a patch to
mitigate the impact:
1. Change log level from `error` to `warn`
2. Add a timeout of 5 seconds to the span dedup check, so that if the
caller doesn't get an answer soon, it defaults to treating the trace as
not a duplicate, which is the most common case.

## Testing
To merge this PR then check log in self monitoring, as it's hard to run
high-volume tests in self monitoring from a non-main branch.
… (#1021)

## Overview

Ensures `DD_LOGS_CONFIG_LOGS_DD_URL` is correctly prefixed with
`https://`

## Testing 

Manually tested that logs get sent to alternate logs intake
## Overview

Continuation of #1018 removing unnecessary mut lock on callers for
dogstatsd
## What?

Upgrade rust to latest stable 1.93.1

## Why?

`time` vulnerability fix is only available on rust >= 1.88.0
## Overview

Add DD_SKIP_SSL_VALIDATION support, parsed from both env and YAML,
matching the datadog-agent's behavior — applied to all outgoing HTTP
clients (reqwest via danger_accept_invalid_certs, hyper via custom
  ServerCertVerifier).

## Motivation

Customers in environments with corporate proxies or custom CA setups
need the ability to disable TLS certificate validation, matching the
existing datadog-agent config option. The Go agent applies
tls.Config{InsecureSkipVerify: true} to all HTTP transports via a
central CreateHTTPTransport() — we mirror this by wiring the config
through to both client builders.

And [SLES-2710](https://datadoghq.atlassian.net/browse/SLES-2710)

## Changes

  Config (config/mod.rs, config/env.rs, config/yaml.rs):
- Add skip_ssl_validation: bool to Config, EnvConfig, and YamlConfig
with default false

  reqwest client (http.rs):
- .danger_accept_invalid_certs(config.skip_ssl_validation) on the shared
client builder

  hyper client (traces/http_client.rs):
- Custom NoVerifier implementing
rustls::client::danger::ServerCertVerifier that accepts all certificates
- Uses CryptoProvider::get_default() (not hardcoded aws_lc_rs) for
FIPS-safe signature scheme reporting
  - New skip_ssl_validation parameter on create_client()

## Testing 

Unit tests and self monitoring

[SLES-2710]:
https://datadoghq.atlassian.net/browse/SLES-2710?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
…34489279a7ba4aa6820'

git-subtree-dir: crates/datadog-agent-config
git-subtree-mainline: 28f796b
git-subtree-split: b0e7f5a
@duncanpharvey duncanpharvey force-pushed the duncan-harvey/config-crate branch from 2daee5b to 7e1eed6 Compare March 10, 2026 19:46
@duncanpharvey duncanpharvey force-pushed the duncan-harvey/config-crate branch from f5f5d6f to ff987e1 Compare March 10, 2026 20:50
@duncanpharvey duncanpharvey changed the title chore: create crate for serverless config chore: create datadog-agent-config crate Mar 11, 2026
@duncanpharvey duncanpharvey changed the title chore: create datadog-agent-config crate chore: add datadog-agent-config crate Mar 11, 2026
@duncanpharvey duncanpharvey marked this pull request as ready for review March 11, 2026 14:42
@duncanpharvey duncanpharvey requested review from a team as code owners March 11, 2026 14:42
@duncanpharvey duncanpharvey requested review from kathiehuang and litianningdatadog and removed request for a team March 11, 2026 14:42
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might not need to migrate this one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is very Lambda specific and doesn't need to be here, as its mostly used for Lifecycle purposes in the Extension

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed aws.rs in 1bff528.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you might not need to migrate this one, as it's very Extension specific

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serverless_flush_strategy is already part of the Config struct so I think using a feature to gate this behavior would be a cleaner solution compared to removing it from the Config struct.

We can always do a refactor to remove it from the Config struct and add it to an AWS specific config struct in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to import this one from dd-trace-rs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in person, will handle in a separate PR

* upgrade rust edition to 2024 for workspace

* apply formatting
@duncanpharvey duncanpharvey merged commit 13ab912 into main Mar 11, 2026
26 checks passed
@duncanpharvey duncanpharvey deleted the duncan-harvey/config-crate branch March 11, 2026 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.