Skip to content

chore: upgrade workspace rust edition to 2024#96

Merged
duncanpharvey merged 2 commits intoduncan-harvey/config-cratefrom
duncan-harvey/rust-edition-2024
Mar 11, 2026
Merged

chore: upgrade workspace rust edition to 2024#96
duncanpharvey merged 2 commits intoduncan-harvey/config-cratefrom
duncan-harvey/rust-edition-2024

Conversation

@duncanpharvey
Copy link
Collaborator

@duncanpharvey duncanpharvey commented Mar 10, 2026

What does this PR do?

Upgrade workspace rust edition to 2024.

Motivation

Upstreaming config crate in #56. The config crate uses rust edition 2024 so this PR upgrades the rust edition to 2024 for the entire workspace.

https://datadoghq.atlassian.net/browse/SVLS-5564

Additional Notes

  • Set environment variables safely with temp-env crate
  • Lots of formatting changes

Describe how to test/QA your changes

Automated tests, built and deployed to Azure Functions in Serverless Compatibility Layer.

@duncanpharvey duncanpharvey marked this pull request as ready for review March 11, 2026 14:42
@duncanpharvey duncanpharvey requested review from a team as code owners March 11, 2026 14:42
@duncanpharvey duncanpharvey requested review from apiarian-datadog and lym953 and removed request for a team March 11, 2026 14:42
@duncanpharvey duncanpharvey merged commit b013122 into duncan-harvey/config-crate Mar 11, 2026
26 checks passed
@duncanpharvey duncanpharvey deleted the duncan-harvey/rust-edition-2024 branch March 11, 2026 17:55
duncanpharvey added a commit that referenced this pull request Mar 11, 2026
* chore(bottlecap): make config a folder module (#242)

* remove `config.rs` file

* create `config/mod.rs`

* move to `config/flush_strategy.rs`

* move to `config/log_level.rs`

* update imports

* fmt

* feat(bottlecap): add logs processing rules (#243)

* add logs processing rules field

* add `regex` crate

* add `processing_rules.rs` config module

* use `processing_rule` module instead

* update logs `processor` to use compiled rules

* update unit test

* Svls 4825 support encrypted keys manual (#258)

* add plumbing for aws secret manager

* strip as much deps as possible

* fix test

* remove unused warning

* reorg runner for bottlecap

* fix overwriting of arch

* add full error to the panic

* avoid building the go agent all the time

* rename module

* speed up build

* add simple scripts to build and publish

* remove deleted call

* remove changes from common scripts

* resolve import conflicts

* wrong file pushed

* make sure permissions are right

* move secret parsing after log activation

* add some stat to build

* add manual req for secret (still broken)

* rebuild after conflict on cargo loc

* automate update and call

* change headers and fix signature

* fix typo and small refactor

* remove useless thread spawn

* small refactors on deploy scripts

* use access key always for signatures

* the secret has to be used to sign

* fix: missing newline in request

* use only manual decrypt

* add timed steps

* add scripts to force restarts

* fix launch script

* refactor decrypt

* cargo format and clippy

* fix clippy error

add formatting/clippy functinos

---------

Co-authored-by: AJ Stuyvenberg <astuyve@gmail.com>

* add kms handling (#261)

* add kms handling

* fix return value

* fix test

* fix kms

* remove committed test file

* rename

* format

* fmt after fix

* fix conflicts

* await async stuff

* formatting

* bubble up error converting to sdt

* use box dyn for generic errors

* reforamt

* address other comments

* remove old build file added with conflict

* Svls 4978 handle secrets error (#271)

* add kms handling

* fix return value

* fix test

* fix kms

* remove committed test file

* rename

* format

* fmt after fix

* fix conflicts

* await async stuff

* formatting

* bubble up error converting to sdt

* use box dyn for generic errors

* reforamt

* address other comments

* remove old build file added with conflict

* do not pass around the whole config for just the secret

* fix scope and just bubble up erros

* reformat

* renaming

* without api key, just call next loop

* fix types and format

* fix folder path

* fix cd and returns

* resolve conflicts

* formatter

* chore(bottlecap): log failover reason (#292)

* print failover reason as json string

* fmt

* update key to be more verbose

* Add APM tracing support (#294)

* wip: tracing

* feat: tracing WIP

* feat: rename mini agent to trace agent

* feat: fmt

* feat: Fix formatting after rename

* fix: remove extra tokio task

* feat: allow tracing

* feat: working v5 traces

* feat: Update to use my branch of libdatadog so we have v5 support

* feat: Update w/ libdatadog to pass trace encoding version

* feat: update w/ merged libdatadog changes

* feat: Refactor trace agent, reduce code duplication, enum for trace version. Pass trace provider. Manual stats flushing. Custom create endpoint until we clean up that code in libdatadog.

* feat: Unify config, remove trace config. Tests pass

* feat: fmt

* feat: fmt

* clippy fixes

* parse time

* feat: clippy again

* feat: revert dockerfile

* feat: no-default-features

* feat: Remove utils, take only what we need

* feat: fmt moves the import

* feat: replace info with debug. Replace log with tracing lib

* feat: more debug

* feat: Remove call to trace utils

* feat: Allow appsec but in a disabled-only state until we add support for the runtime proxy (#296)

* feat: Allow appsec but in a disabled-only state until we add support for the runtime proxy

* feat: Log failover reason

* fix: serverless_appsec_enabled. Also log the reason

* feat: Require DD_EXTENSION_VERSION: next (#302)

* feat: Require DD_EXTENSION_VERSION: next

* feat: add tests, fix metric tests

* feat: revert metrics test byte changes

* feat: fmt

* feat: remove ref

* feat: honor enhanced metrics bool (#307)

* feat: honor enhanced metrics bool

* feat: add test

* feat: refactor to log instead of return result

* fix: clippy

* feat: warn by default (#316)

* chore(bottlecap): fallback on `datadog.yaml` usage (#326)

* fallback on `datadog.yaml` presence

* add comment

* fix(bottlecap): filter debug logs from external crates (#329)

* remove `tracing-log`

instead, use the `tracing-subscriber` `tracing-log` feature

* capitalize debugs

* remove unnecessary file

* update log formatter prefix

* update log filter

* fmt

* chore(bottlecap): switch flushing strategy to race (#318)

* feat: race flush

* refactor: periodic only when configured

* fmt

* when flushing strategy is default, set periodic flush tick to `1s`

* on `End`, never flush until the end of the invocation

* remove `tokio_unstable` feature for building

* remove debug comment

* remove `invocation_times` mod

* update `flush_control.rs`

* use `flush_control` in main

* allow `end,<ms>` strategy

allows to flush periodically over a given amount of seconds and at the end

* update `debug` comment for flushing

* simplify logic for flush strategy parsing

* remove log that could spam debug

* refactor code and add unit test

---------

Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com>
Co-authored-by: alexgallotta <5581237+alexgallotta@users.noreply.github.com>

* remove log that might confuse customers (#333)

* Fix dogstatsd multiline (#335)

* test: add invalid string and multi line distro test with empty newline

* test: move unit test to appropriate package

* fix: do not error log for empty and new line strings

---------

Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com>

* add env vars to be ignored (#337)

* feat: Open up more env vars which we don't rely on (#344)

* feat: Allow trace disabled plugins (#348)

* feat: Allow trace disabled plugins

* feat: trace debug

* feat: Allowlist additional env vars (#354)

* feat: Allowlist additional env vars

* fix: fmt

* feat: and repo url

* aj/allow apm replace tags array (#358)

* fix: allow objects to be ignored

* feat: specs

* fix(bottlecap): set explicit deny list and allow yaml usage (#363)

* set explicit deny list

also allow `datadog.yaml` usage

* add unit test for parsing rule from yaml

* remove `object_ignore.rs`

* remove import

* remove logging failover reason when user is not opt-in

* chore(bottlecap): fast failover (#371)

* failover fast

* typo

* failover on `/opt/datadog_wrapper` set

* aj/fix log level casing (#372)

* feat: serde's rename_all isn't working, use a custom deserializer to lowercase loglevels

* feat: default is warn

* feat: Allow reptition to clear up imports

* feat: rebase

* feat: failover on dd proxy (#391)

* feat: support HTTPS_PROXY (#381)

* feat: support DD_HTTP_PROXY and DD_HTTPS_PROXY

* fix: remove import

* fix: fmt

* feat: Revert fqdn changes to enable testing

* feat: Use let instead of repeated instantiation

* feat: Rip out proxy stuff we dont need but make sure we dont proxy the telemetry or runtime APIs with system proxies

* feat: remove debug

* fix: no debugs for hyper/h2

* fix: revert cargo changes

* feat: Pin libdatadog deps to v13.1

* fix: rebase with dogstatsd 13.1

* fix: use main for dsdrs

* fix: remove unwrap

* fix: fmt

* fix: licenses

* increase size boo

* fix: size ugh

* fix: install_default() in tests

* aj/honor both proxies in order (#410)

* feat: Honor priority order of DD_PROXY_HTTPS over HTTPS_PROXY

* feat: fmt

* fix: Prefer Ok over some + ok

* Feat: Use tags for proxy support in libdatadog

* fix: no proxy for tests

* fix: license

* all this for a comma

* accept `datadog_wrapper`

* Revert "accept `datadog_wrapper`"

This reverts commit 9560657582f2f22c8e68af5d0bb9d7d2b0765650.

* accept `datadog_wrapper` (#373)

* feat(bottlecap): create Inferred Spans baseline + infer API Gateway HTTP spans (#405)

* add `Trigger` trait for inferred spans

* add `ApiGatewayHttpEvent` trigger

* add `SpanInferrer`

* make `invocation::processor` to use `SpanInferrer`

* send `aws_config` to `invocation::processor`

* use incoming payload for `invocation::processor` for span inferring

* add `api_gateway_http_event.json` for testing

* add `api_gateway_proxy_event.json` for testing

* fix: Convert tag hashmap to sorted vector of tags

* fix: fmt

---------

Co-authored-by: AJ Stuyvenberg <astuyve@gmail.com>

* feat(bottlecap): Add Composite Trace Propagator (#413)

* add `trace_propagation_style.rs`

* add Trace Propagation to `config.rs`

also updated unit tests, as we have custom behavior, we should check only the fields we care about in the tests

* add `links` to `SpanContext`

* add composite propagator

also known as our internal http propagator, but in reality, http doesnt make any sense to me, its just a composite propagator which we used based on our configuration

* update `TextMapPropagator`s to comply with interface

also updated the naming

* fmt

* add unit testing for `config.rs`

* add `PartialEq` to `SpanContext`

* correct logic from `text_map_propagator.rs`

logic was wrong in some parts, this was discovered through unit tests

* add unit tests for `DatadogCompositePropagator`

also corrected some logic

* feat(bottlecap): add capture lambda payload (#454)

* add `tag_span_from_value`

* add `capture_lambda_payload` config

* add unit testing for `tag_span_from_value`

* update listener `end_invocation_handler`

parsing should not be handled here

* add capture lambda payload feature

also parse body properly, and handle `statusCode`

* feat(bottlecap): add Cold Start Span + Tags (#450)

* add some helper functions to `invocation::lifecycle` mod

* create cold start span on processor

* move `generate_span_id` to father module

* send `platform_init_start` data to processor

* send `PlatformInitStart` to main bus

* update cold start `parent_id`

* fix start time of cold start span

* enhanced metrics now have a `dynamic_value_tags` for tags which we have to calculate at points in time

* `AwsConfig` now has a `sandbox_init_time` value

* add `is_empty` to `ContextBuffer`

* calculate init tags on invoke

also add a method to reset processor invocation state

* restart init tags on set

* set tags properly for proactive init

* fix unit test

* remove debug line

* make sure `cold_start` tag is only set in one place

* feat(bottlecap): support service mapping and `peer.service` tag (#455)

* add some helper functions to `invocation::lifecycle` mod

* create cold start span on processor

* move `generate_span_id` to father module

* send `platform_init_start` data to processor

* send `PlatformInitStart` to main bus

* update cold start `parent_id`

* fix start time of cold start span

* enhanced metrics now have a `dynamic_value_tags` for tags which we have to calculate at points in time

* `AwsConfig` now has a `sandbox_init_time` value

* add `is_empty` to `ContextBuffer`

* calculate init tags on invoke

also add a method to reset processor invocation state

* restart init tags on set

* set tags properly for proactive init

* fix unit test

* remove debug line

* make sure `cold_start` tag is only set in one place

* add service mapping config serializer

* add `service_mapping.rs`

* add `ServiceNameResolver` interface

for service mapping

* implement interface in every trigger

* send `service_mapping` lookup table to span enricher

* create `SpanInferrer` with `service_mapping` config

* fmt

* rename failover to fallback (#465)

* fix(bottlecap): fallback when otel set (#470)

* fallback on otel

* add unit test

* feat(bottlecap): fallback on opted out only (#473)

* fallback on opted out only

* log on opted out

* fix(bottlecap): fallback on yaml otel config (#474)

* fallback on opted out only

* fallback on yaml otel config

* switch `legacy` to `compatibility`

* feat: honor serverless_logs (#475)

* feat: honor serverless_logs

* fmt

---------

Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com>

* feat: Flush timeouts (#480)

* fix version parsing for number (#492)

* fix: fallback on intake urls (#495)

* fallback on `dd_url`, `dd_url`, and, apm and logs intake urls

* fix env var for apm url

* grammar

* set dogstatsd timeout (#497)

* set dogstatsd timeout

* add todo for other edge case

* add comment on jitter. Likely not required for lambda

* fmt

* update license

* update sha for dogstatsd

---------

Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com>

* fix: set right domain and arn by region on secrets manager (#511)

* check whether the region is in China and use the appropriated domain

* correct arn for lambda in chinese regions

* fix: typo in china arn

* fix: reuse function to detect right aws partition and support gov too

* nest and rearrange imports

* fix imports again

* fix: Honor noproxy and skip proxying if ddsite is in the noproxy list (#520)

* fix: Honor noproxy and skip proxying if ddsite is in the noproxy list

* feat: specs

* feat: Oneline check, add comment

* Support proxy yaml config (#523)

* fix: Honor noproxy and skip proxying if ddsite is in the noproxy list

* feat: specs

* feat: yaml proxy had a different format

* feat: Oneline check, add comment

* feat: Support nonstandard proxy config

* feat: specs

* fix: bad merge whoops

* feat: Support snapstart's vended credentials (#532)

* feat: Support snapstart's vended credentials

* feat: Add snapstart events

* fix: specs

* feat: Mutable config as we consume it entirely by the secrets module.

* fix: needless borrow

* feat: add zstd and compress (#558)

* feat: add zstd and compress

* hack: skip clippy for a sec

* feat: Honor logs config settings.

* fix: dont set zstd header unless we compress

* fmt

* clippy

* fmt

* fix: ints

* licenses

* remove debug code

* wtf clippy and fmt, pick one

---------

Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com>

* Svls 6036 respect timeouts (#537)

* log shipping times

* set flush timeout for traces

* remove retries

* fix conflicts

* address comments

* Fallback on gov regions (#550)

* Aj/support pci and custom endpoints (#585)

* feat: logs_config_logs_dd_url

* feat: apm pci endpoints

* feat: metrics

* feat: support metrics using dogstatsd methods

* fix: use the right var

* tests: use server url override

* feat: refactor into flusher method

* feat: clippy

* Aj/yaml apm replace tags (#602)

* feat: yaml APM replace tags rule parsing

* feat: Custom deserializer for replace tags. yaml -> JSON so we can rely on the same method because ReplaceRule is totally private

* remove aj

* feat: merge w/ libdatadog main

* feat: Parse http obfuscation config from yaml

* feat: licenses

* feat: parse env and service as strings or ints (#608)

* feat: parse env and service as strings or ints

* feat: add service test

* fmt

* Add DSM and Profiling endpoints (#622)

- **feat: Support DSM proxy endpoint**
- **feat: profiling support**
- **feat: add additional tags**

* chore(config): parse config only twice  (#651)

# What?

Removes `FallbackConfig` and `FallbackYamlConfig` in favor of the
existing configurations.

# How?

1. Using only the known places where we are going to fallback from the
available configs.
2. Moved environment variables and yaml config to its own file for
readability.

# Notes

- Added fallbacks for OTLP (in preparation for that PR, allowed some
fields to not fallback).

* fix: Parse DD_APM_REPLACE_TAGS env var (#656)

Fixes an issue where we didn't parse `DD_APM_REPLACE_TAGS` because the
yaml block includes an additional `config` word after APM, which is not
present in the env var.

As usual, env vars override config file settings

* feat: Optionally disable proc enhanced metrics (#663)

Fixes #648

For customers using very very fast/small lambda functions (usually just
rust), there can be a small 1-2ms increase in runtime duration when
collecing metrics like open file descriptors or tmp file usage.

We still enable these by default, but customers can now optionally
disable them

* fix(config): serialize booleans from anything (#657)

# What?

Serializes any boolean with values `0|1|true|TRUE|False|false` to its
boolean part.

# How?

Using `serde-aux` crate to leverage the unit testing and ownership.

# Motivation

Some products at Datadog allow this values as they coalesce them –
[SVLS-6687](https://datadoghq.atlassian.net/browse/SVLS-6687)

[SVLS-6687]:
https://datadoghq.atlassian.net/browse/SVLS-6687?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

* chore(config): create `aws` module (#659)

# What?

Refactors methods related to AWS config into its own module

# Motivation

Just cleaning and removing stuff from main
– [SVLS-6686](https://datadoghq.atlassian.net/browse/SVLS-6686)

[SVLS-6686]:
https://datadoghq.atlassian.net/browse/SVLS-6686?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

* feat: [SVLS-6242] bottlecap fips builds (#644)

Building bottlecap with fips mode.

This is entirely focused on removing `ring` (and other
non-FIPS-compliant dependencies from our `fips`-featured builds.)

* fix(config): remove `apm_ignore_resources` check in OTEL (#676)

# What?

Removes usage of `DD_APM_IGNORE_RESOURCES` in the OTEL span transform.

# Why?

1. The implementation was incorrect and shouldn't check for resources to
ignore in the transformation step.
2. It was not properly used in the `apm_config` for YAML files.

# Notes:

- Follow up PR to implement `APM_IGNORE_RESOURCES` properly in the Trace
Agent.

# More

Learn about ignoring resources:
https://docs.datadoghq.com/tracing/guide/ignoring_apm_resources/?tab=datadogyaml#ignoring-based-on-resources

`DD_APM_IGNORE_RESOURCES` is specified as:

```
A list of regular expressions can be provided to exclude certain traces based on their resource name.
All entries must be surrounded by double quotes and separated by commas.
```

A correct usage would be:

```env
DD_APM_IGNORE_RESOURCES="(GET|POST) /healthcheck,API::NotesController#index"
```

or in yaml
```yaml
apm_config:
  ignore_resources: ["(GET|POST) /healthcheck","API::NotesController#index"]
```

* feat(proxy): abstract lambda runtime api proxy (#669)

# What?

Abstracts the concept of the `proxy` from the Lambda Web Adapter
implementation.
This will unlock the usage of ASM.

# How?

Using `axum` crate, we create a new server proxy with specific routes
from the Lambda Runtime API which we are interested in proxying.

# Motivation

ASM and [SVLS-6760](https://datadoghq.atlassian.net/browse/SVLS-6760)



[SVLS-6760]:
https://datadoghq.atlassian.net/browse/SVLS-6760?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

* fix(config): fix otlp trace agent to start when right configuration is set (#680)

# What?

Ensures that OTLP agent is only enabled when the
`otlp_config_receiver_protocols_http_endpoint` is set, and when
`otlp_config_traces_enabled` is `true`

 # Motivation

#678 

# Notes

OTEL agent should only spin up when receiver protocols endpoint is set,
so this was a miss on my side.

* feat: continuous flushing strategy for high throughput functions (#684)

This is a heavy refactor and new feature.
- Introduces FlushDecision and separates it from FlushStrategy
- Cleans up FlushControl logic and methods

It also adds the ability to flush telemetry across multiple serial
lambda invocations. This is done using the `continuous` strategy.

This is a huge win for busy functions as seen in our test fleet, where
the p99/max drops precipitously, which also causes the average to
plummet. This also helps reduce the number of cold starts encountered
during scaleup events, which further reduces latency along with costs:

![image](https://github.com/user-attachments/assets/14851e22-327d-43b0-8246-5780cfbf6ef7)

Technical implementation:
We spawn the task and collect the flush handles, then in the two
periodic strategies we check if there were any errors or unresolved
futures in the next flush cycle. If so, we switch to the `periodic`
strategy to ensure flushing completes successfully.

We don't adapt to the periodic strategy unless the last 20 invocations
occurred within the `config.flush_timeout` value, which has been
increased by default. This is a naive implementation. A better one would
be to calculate the first derivative of the invocation periodicity. If
the rate is increasing, we can adapt to the continuous strategy. If the
rate slows, we should fall back to the periodic strategy.
<img width="807" alt="image"
src="https://github.com/user-attachments/assets/d3c25419-f1da-4774-975f-0e254047b9b7"
/>

The existing implementation is cautious in that we could definitely
adapt sooner but don't.


Todo: add a feature flag for continuous flushing?

* fix: bump flush_timeout default (#697)

A little goofy because we use this to determine when/how to move over to
continuous flushing, but the gist is that our invocation context tracks
the start time of each invocation. Because it's all local to a single
sandbox, this means that the time diff between invocations includes post
runtime duration, so it's very common to have 20 invocations greater
than 10s if there are even a couple of periodic/end flushes in there.

This customizable with `DD_FLUSH_TIMEOUT` so if people want to set it to
a very short timeout, they are able to.

* feat: Allow users to specify continuous strategy (#701)

https://datadoghq.atlassian.net/browse/SVLS-6994

* feat: Use http2 unless overridden or using a proxy (#706)

We rolled out HTTP/2 support for logs in v81, which seems to have broken
logs for some users relying on proxies which may not support http2.

This change introduces a new configuration option called `use_http1`.

1. If `DD_HTTP_PROTOCOL` is explicitly set to http1, we'll use it
2. If `DD_HTTP_PROTOCOL` is not set and the user is using a proxy, we'll
use http1 unless overridden by the `DD_HTTP_PROTOCOL` flag being set to
anything other than `http1`.

fixes #705

* Dual shipping metrics support (#704)

Adds support for dual shipping metrics to endpoints configured using the
`additional_endpoints` YAML or `DD_ADDITIONAL_ENDPOINTS` env var config.

For each configured endpoint/API key combination, we now create a
separate `MetricsFlusher` to flush the same batch of metrics to multiple
endpoints in parallel. Also, updates the retry logic to retry flushing
for the specific flusher that encountered an error.

Tested dual shipping metrics to 2 additional orgs/endpoints including
eu1.

Depends on dogstatsd changes:
https://github.com/DataDog/serverless-components/pull/20

* chore: Separate AwsCredentials from AwsConfig (#716)

# Problem
Right now `AwsConfig` has a lot of fields, including the ones related to
credential:
```
    pub aws_access_key_id: String,
    pub aws_secret_access_key: String,
    pub aws_session_token: String,
    pub aws_container_credentials_full_uri: String,
    pub aws_container_authorization_token: String,
```

The next PR https://github.com/DataDog/datadog-lambda-extension/pull/717
wants to lazily load API key and the credentials. To do that, for the
resolver function `resolve_secrets()`, I need to change the param
`aws_config` from `&AwsConfig` to `Arc<RwLock<AwsConfig>>`. Because
`aws_config` is passed to many places, this change involves updating
lots of functions, which is formidable.

# This PR
Separates these credential-related fields out from `AwsConfig` and
creates a new struct `AwsCredentials`

Thus, the next PR will only need to change the param `aws_credentials`
from `&AwsCredentials` to `Arc<RwLock<AwsCredentials>>`. Because
`aws_credentials` is not used in lots of places, the next PR becomes
easier.

https://datadoghq.atlassian.net/issues/SVLS-6996
https://datadoghq.atlassian.net/issues/SVLS-6998

* chore(config): separate config from sources (#709)

# What?

Separates the configuration from sources, allowing it to be used in more
use cases.

# How?

Creates new default configuration and separates the environment
variables and YAML sources from the default.

# Why?

Make it easier to track changes in every source, as the field names
might be different to what they are used at the configuration level.

# Notes

I expect to abstract this even more by providing it as a crate which can
have features, that way customers can only use the sources and product
specific fields they need.

---------

Co-authored-by: Aleksandr Pasechnik <aleksandr.pasechnik@datadoghq.com>
Co-authored-by: Florentin Labelle <florentin.labelle@outlook.fr>

* Dual Shipping Logs Support (#718)

Adds support for dual shipping metrics to endpoints configured using the
`logs_config` YAML or `DD_LOGS_CONFIG_ADDITIONAL_ENDPOINTS` env var
config.

Implemented a `LogsFlusher` as a wrapper to all the `Flusher` instances
to manages flushing to all configured endpoints.

Moved retry logic to `LogsFlusher`, as the retry request contains the
endpoint details and does not have to be tied to a particular flusher.

---------

Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com>

* chore: upgrade rust version for toolchain to 1.84.1 (#743)

# This PR
1. In `rust-toolchain.toml`, upgrade Rust version from `1.81.0` to
`1.84.1`.
2. Fix/mute clippy errors caused by the upgrade
- some errors require non-trivial code changes, so I muted them for now
and added a TODO to fix them in separate PRs.

# Motivation
`libdatadog` now uses `1.84.1`
https://github.com/DataDog/libdatadog/blame/main/Cargo.toml#L62

To test changes on `libdatadog`, I need to change the Rust version in
`datadog-lambda-extension` to 1.84.1 as well.

Making this a separate PR:
1. so it's easier to test multiple PRs that depend on changes on
`libdatadog` in parallel after I merge this PR to main.
4. because this PR also involves lots of code changes needed to make
clippy happy

* feat: dual shipping APM support (#735)

Adds support for dual shipping traces to endpoints configured using the
`apm_config` YAML or `DD_APM_CONFIG_ADDITIONAL_ENDPOINTS` env var
config.

#### Additional Notes:
- Bumped libdatadog (and serverless-components) to include
https://github.com/DataDog/libdatadog/pull/1139
- Adds configuration option to set compression level for trace payloads

* chore: Add doc and rename function for flushing strategy (#740)

# Motivation

It took me quite some effort to understand flushing strategies. I want
to make it easier to understand for me and future developers.

# This PR
Tries to make flushing strategy code more readable:
1. Add/move comments
2. Create an enum `ConcreteFlushStrategy`, which doesn't contain
`Default` because it is required to be resolved to a concrete strategy
3. Rename `should_adapt` to `evaluate_concrete_strategy()`

# To reviewers
There are still a few things I don't understand, which are marked with
`TODO`. Appreciate explanation!
Also correct me if any comment I added is wrong.

* chore: upgrade to edition 2024 and fix all linter warnings (#754)

Also updates CI to run `clippy` on `--all-targets` so that linter errors
aren't ignored on side targets such as tests.

* fix(apm): Enhance Synthetic Span Service Representation (#751)

<!--- Please remember to review the [contribution
guidelines](https://github.com/DataDog/datadog-lambda-python/blob/main/CONTRIBUTING.md)
if you have not yet done so._ --->

### What does this PR do?
<!--- A brief description of the change being made with this pull
request. --->

Rollout of span naming changes to align serverless product with tracer
to create streamlined Service Representation for Serverless

Key Changes:

- Change service name to match instance name for all managed services
(aws.lambda -> lambda name, etc) (breaking)
- Opt out via `DD_TRACE_AWS_SERVICE_REPRESENTATION_ENABLED`

- Add `span.kind:server` on synthetic spans made via span-inferrer, cold
start and lambda invocation spans

- Remove `_dd.base_service` tags on synthetic spans to avoid
unintentional service override

### Motivation

<!--- What inspired you to submit this pull request? --->

Improve Service Map for Serverless. This allows for synthetic spans to
have their own service on the map which connects with the inferred spans
from the tracer side.

* feat: port of Serverless AAP from Go to Rust (#755)

# What?

Ports the Serverless App & API Protection feature (AAP, also known as
Serverless AppSec) from the Go extension to Rust.

This is using https://github.com/DataDog/libddwaf-rust to provide
bindings to the in-app WAF.

This provides enhanced support for API Protection (notably, response
schema collection) compared to the Go version.

Tradeoff is that XML request and response security processing is not
currently supported in this version (it was in Go, but likely seldom
used).

This introduces a `bottlecap::appsec::processor::Processor` that is
integrated in the `bottlecap::proxy::Interceptor` (for request &
response acquisition) as well as in the
`bottlecap::trace_processor::TraceProcessor` (to decorate the
`aws.lambda` span with security data).

# Why?

We plan on decommissioning the Go version of the agent and a tracer-side
version of the Serverless AAP feature will not be available across all
supported language runtimes before several weeks/months.

Also [SVLS-5286](https://datadoghq.atlassian.net/browse/SVLS-5286)

# Notes


[SVLS-5286]:
https://datadoghq.atlassian.net/browse/SVLS-5286?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

---------

Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com>

* feat: No longer launch Go-based agent for compatibility/OTLP/AAP config (#788)

https://datadoghq.atlassian.net/browse/SVLS-7398

- As part of coming release, bottlecap agent no longer launches Go-based
agent when compatibility/AAP/OTLP features are active
- Emit the same metric when detecting any of above configuration
- Update corresponding unit tests

Manifests:
- [Test lambda
function](https://us-east-1.console.aws.amazon.com/lambda/home?region=us-east-1#/functions/ltn1-fullinstrument-bn-cold-python310-lambda?code=&subtab=envVars&tab=testing)
with
[logs](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups/log-group/$252Faws$252Flambda$252Fltn1-fullinstrument-bn-cold-python310-lambda/log-events/2025$252F08$252F21$252F$255B$2524LATEST$255Df3788d359677452dad162488ff15456f$3FfilterPattern$3Dotel)
showing compatibility/AAP/OTPL are enabled
<img width="2260" height="454" alt="image"
src="https://github.com/user-attachments/assets/5dfd4954-5191-4390-83f5-a8eb3bffb9d3"
/>

-
[Logging](https://app.datadoghq.com/logs/livetail?query=functionname%3Altn1-fullinstrument-bn-cold-python310-lambda%20Metric&agg_m=count&agg_m_source=base&agg_t=count&cols=host%2Cservice&fromUser=true&messageDisplay=inline&refresh_mode=paused&storage=driveline&stream_sort=desc&viz=stream&from_ts=1755787655569&to_ts=1755787689060&live=false)
<img width="1058" height="911" alt="image"
src="https://github.com/user-attachments/assets/629f75d1-e115-4478-afac-ad16d9369fa7"
/>

-
[Metric](https://app.datadoghq.com/screen/integration/aws_lambda_enhanced_metrics?fromUser=false&fullscreen_end_ts=1755788220000&fullscreen_paused=true&fullscreen_refresh_mode=paused&fullscreen_section=overview&fullscreen_start_ts=1755787200000&fullscreen_widget=2&graph-explorer__tile_def=N4IgbglgXiBcIBcD2AHANhAzgkAaEAxgK7ZIC2A%2BhgHYDWmcA2gLr4BOApgI5EfYOxGoTphRJqmDhQBmSNmQCGOeJgIK0CtnhA8ObCHyagAJkoUVMSImwIc4IMhwT6CDfNQWP7utgE8AjNo%2BvvaYRGSwpggKxkgA5gB0kmxgemh8mAkcAB4IHBIQ4gnSChBoSKlswAAkCgDumBQKBARW1Ai41ZxxhdSd0kTUBAi9AL4ABABGvuPAA0Mj4h6OowkKja2DCAAUAJTaCnFx3UpyoeEgo6wgsvJEGgJCN3Jk9wrevH6BV-iWbMqgTbtOAAJgADPg5MY9BRpkZEL4UHZ4LdXhptBBqNDsnAISAoXp7NDVJdmKMfiBsL50nBgOSgA&refresh_mode=sliding&from_ts=1755783890661&to_ts=1755787490661&live=true)
<img width="1227" height="1196" alt="image"
src="https://github.com/user-attachments/assets/2922eb54-9853-4512-a902-dfa97916b643"
/>

* Revert "feat: No longer launch Go-based agent for compatibility/OTLP/AAP config (#788)"

This reverts commit 0f5984571eb842e5ce1cbadbec0f92d73befcd08.

* Ignoring Unwanted Resources in APM (#794)

## Task
https://datadoghq.atlassian.net/browse/SVLS-6846

## Overview
We want to allow users to set filter tags which drops traces with root
spans that match specified span tags. Specifically, users can set
`DD_APM_FILTER_TAGS_REQUIRE` or `DD_APM_FILTER_TAGS_REJECT`.

More info
[here](https://docs.datadoghq.com/tracing/guide/ignoring_apm_resources/?tab=datadogyaml#trace-agent-configuration-options).

## Testing
Deployed changes to Lambda. Invoked Lambda directly and through API
Gateway to check with different root spans. Set the tags to either be
REQUIRE or REJECT with value `name:aws.lambda`. Confirmed in logs and UI
that we were dropping spans.

* feat: eat: Add hierarchical configurable compression levels (#800)

feat: Add hierarchical configurable compression levels

- Add global compression_level config parameter (0-9, default: 6) with
fallback hierarchy
- Support 2-level compression configuration: global level first, then
module-specific
- This makes configuration more convenient - set once globally or
override per module
- Apply compression configuration to metrics flushers and trace
processor
  - Add environment variable DD_COMPRESSION_LEVEL for global setting

Test
- Configuration:
<img width="966" height="742" alt="image"
src="https://github.com/user-attachments/assets/b33c0fd3-2b02-4838-8660-fc9ea9493998"
/>
-
([log](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups/log-group/$252Faws$252Flambda$252Fltn1-fullinstrument-bn-cold-python310-lambda/log-events/2025$252F08$252F25$252F$255B$2524LATEST$255D9c19719435bc48839f6f005d2b58b552))
Configuration:
<img width="965" height="568" alt="image"
src="https://github.com/user-attachments/assets/dfef594a-549f-4773-879d-549234f03fb7"
/>

* cherry pick: No longer launch Go-based agent for compatibility/OTLP/AAP config (#817)

Cherry pick of previously reverted #788 

https://datadoghq.atlassian.net/browse/SVLS-7398

- As part of coming release, bottlecap agent no longer launches Go-based
agent when compatibility/AAP/OTLP features are active
- Emit the same metric when detecting any of above configuration
- Update corresponding unit tests

Attention: it is an known issue with .Net
https://github.com/aws/aws-lambda-dotnet/issues/2093

Manifests:
- [Test lambda
function](https://us-east-1.console.aws.amazon.com/lambda/home?region=us-east-1#/functions/ltn1-fullinstrument-bn-cold-python310-lambda?code=&subtab=envVars&tab=testing)
with

[logs](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups/log-group/$252Faws$252Flambda$252Fltn1-fullinstrument-bn-cold-python310-lambda/log-events/2025$252F08$252F21$252F$255B$2524LATEST$255Df3788d359677452dad162488ff15456f$3FfilterPattern$3Dotel)
showing compatibility/AAP/OTPL are enabled
<img width="2260" height="454" alt="image"

src="https://github.com/user-attachments/assets/5dfd4954-5191-4390-83f5-a8eb3bffb9d3"
/>

-

[Logging](https://app.datadoghq.com/logs/livetail?query=functionname%3Altn1-fullinstrument-bn-cold-python310-lambda%20Metric&agg_m=count&agg_m_source=base&agg_t=count&cols=host%2Cservice&fromUser=true&messageDisplay=inline&refresh_mode=paused&storage=driveline&stream_sort=desc&viz=stream&from_ts=1755787655569&to_ts=1755787689060&live=false)
<img width="1058" height="911" alt="image"

src="https://github.com/user-attachments/assets/629f75d1-e115-4478-afac-ad16d9369fa7"
/>

-

[Metric](https://app.datadoghq.com/screen/integration/aws_lambda_enhanced_metrics?fromUser=false&fullscreen_end_ts=1755788220000&fullscreen_paused=true&fullscreen_refresh_mode=paused&fullscreen_section=overview&fullscreen_start_ts=1755787200000&fullscreen_widget=2&graph-explorer__tile_def=N4IgbglgXiBcIBcD2AHANhAzgkAaEAxgK7ZIC2A%2BhgHYDWmcA2gLr4BOApgI5EfYOxGoTphRJqmDhQBmSNmQCGOeJgIK0CtnhA8ObCHyagAJkoUVMSImwIc4IMhwT6CDfNQWP7utgE8AjNo%2BvvaYRGSwpggKxkgA5gB0kmxgemh8mAkcAB4IHBIQ4gnSChBoSKlswAAkCgDumBQKBARW1Ai41ZxxhdSd0kTUBAi9AL4ABABGvuPAA0Mj4h6OowkKja2DCAAUAJTaCnFx3UpyoeEgo6wgsvJEGgJCN3Jk9wrevH6BV-iWbMqgTbtOAAJgADPg5MY9BRpkZEL4UHZ4LdXhptBBqNDsnAISAoXp7NDVJdmKMfiBsL50nBgOSgA&refresh_mode=sliding&from_ts=1755783890661&to_ts=1755787490661&live=true)
<img width="1227" height="1196" alt="image"

src="https://github.com/user-attachments/assets/2922eb54-9853-4512-a902-dfa97916b643"
/>
====
Another manifest for .Net:
- [Lambda
function](https://us-east-1.console.aws.amazon.com/lambda/home?region=us-east-1#/functions/ltn1-fullinstrument-bn-cold-dotnet6-lambda?code=&subtab=envVars&tab=testing)
-
[Log](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups/log-group/$252Faws$252Flambda$252Fltn1-fullinstrument-bn-cold-dotnet6-lambda/log-events/2025$252F08$252F29$252F$255B$2524LATEST$255D15ca867ee94049129ed461283ae46f01$3FfilterPattern$3Dfailover)
- Configuration
<img width="1490" height="902" alt="image"
src="https://github.com/user-attachments/assets/b070e5e1-8335-4494-877f-6475d9959af2"
/>
- Log shows the issue reasons
<img width="990" height="536" alt="image"
src="https://github.com/user-attachments/assets/5503de33-ea92-401c-a595-c165e39b0c6e"
/>
<img width="848" height="410" alt="image"
src="https://github.com/user-attachments/assets/54d1e87c-93e7-4084-8a9a-173cb7d0c4a7"
/>
<img width="938" height="458" alt="image"
src="https://github.com/user-attachments/assets/4f205ec2-d923-47d1-9005-762650798894"
/>

---------

Co-authored-by: Tianning Li <tianning.li@datadoghq.com>

* feat: [Trace Stats] Add feature flag DD_COMPUTE_TRACE_STATS (#841)

## This PR

Adds a feature flag `DD_COMPUTE_TRACE_STATS`.
- If true, trace stats will be computed from the extension side. In this
case, we set `_dd.compute_stats` to `0`, so trace stats won't be
computed on the backend.
- If false, trace stats will NOT be computed from the extension side. In
this case, we set `_dd.compute_stats` to `1`, so trace stats will be
computed on the backend.
- Defaults to false for now, so `_dd.compute_stats` still defaults to
`1`, i.e. default behavior is not changed.
- After we fully support computing trace stats on extension side, I will
change the default to true then delete the flag.

Jira: https://datadoghq.atlassian.net/browse/SVLS-7593

* fix: use tokio time instead of std time because tokio time can be frozen (#846)

Tokio time allows us to pause or sleep without blocking the runtime. It
also allows time to be paused (mainly for tests). I think we may need
the sleep to force blocking code to yield

---------

Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com>

* add support for observability pipeline (#826)

## Task

https://datadoghq.atlassian.net/jira/software/c/projects/SVLS/boards/5420?quickFilter=7573&selectedIssue=SVLS-7525

## Overview
* Add support for sending logs to an Observability Pipeline instead of
directly to Datadog.
* To enable, customers must set
`DD_ENABLE_OBSERVABILITY_PIPELINE_FORWARDING` to true, and
`DD_LOGS_CONFIG_LOGS_DD_URL` to their Observability Pipeline endpoint.
Will fast follow and update docs to reflect this.
* Initially, I was using setting up the observability pipeline with
'Datadog Agent' as the source. This required us to format the log
message in a certain format. However, chatting with the Observability
Pipeline Team, they actually recommend we use 'Http Server' as the
source for our pipeline setup instead since this just accepts any json.

## Testing
Created an [observability
pipeline](https://ddserverless.datadoghq.com/observability-pipelines/b15e4a64-880d-11f0-b622-da7ad0900002/view)
and deployed a lambda function with the changes. Triggered the lambda
function and confirmed we see it in our
[logs](https://ddserverless.datadoghq.com/logs?query=function_arn%3A%22arn%3Aaws%3Alambda%3Aus-east-1%3A425362996713%3Afunction%3Aobcdkstackv3-hellofunctionv3ec5a2fbe-l9qvtrowzb5q%22&agg_m=count&agg_m_source=base&agg_t=count&cols=host%2Cservice&messageDisplay=inline&refresh_mode=sliding&storage=hot&stream_sort=desc&viz=stream&from_ts=1758196420534&to_ts=1758369220534&live=true).
We know it is going through the observability pipeline because we can
see an attached 'http_server' attached as the source type.

* feat: lower zstd default compression (#867)

A quick test run showed our max duration skews on smaller lambda sizes
with lots of data setting the zstd compression level to 6. Looks like we
start to block the CPU at around thi smark.

Gonna default it to 3, as tested below with 3 500k runs.
<img width="1293" height="319" alt="image"
src="https://github.com/user-attachments/assets/d1224676-f14f-4a55-8440-089bb9ff91d0"
/>

* revert(#817): reverts fallback config  (#871)

# What?

This reverts commit 2396c4fe102677179c834c2dd65cb5b2ea79ca8f from #817 

# Why?

Need a release

# Notes

We'll cherry pick and bring it back at some point

* chore: [Trace Stats] Rename env var DD_COMPUTE_TRACE_STATS (#875)

# This PR
As @apiarian-datadog suggested in
https://github.com/DataDog/datadog-lambda-extension/pull/841#discussion_r2376111825,
rename the feature flag `DD_COMPUTE_TRACE_STATS` to
`DD_COMPUTE_TRACE_STATS_ON_EXTENSION` for clarity.

# Notes
Jira: https://datadoghq.atlassian.net/browse/SVLS-7593

* feat: remove failover to go (#882)

Removes the failover to Go. If we can't parse any of the config options
we log the failing value and move on with the default specified.

* fix: use datadog as default propagation style if supplied version is malformed (#891)

Fixes an issue where config parsing fails if this is invalid

* fix: use None if propagation style is invalid (#895)

After internal discussion we determined that the tracing libraries use
None of the trace propagation style is invalid or malformed.

This brings us into alignment.

* feat: Support periodic reload for api key secret (#893)

# This PR
Supports the env var `DD_API_KEY_SECRET_RELOAD_INTERVAL`, in seconds. It
applies when Datadog API Key is set using `DD_API_KEY_SECRET_ARN`. For
example:
- if it's `120`, then api key will be reloaded about every 120 seconds.
Note that reload can only be triggered when api key is used, usually
when data is being flushed. If there is no invocation and no data needs
to be flushed, then reload won't happen.
- If it's not set or set to `0`, then api key will only be loaded once
the first time it is used, and won't be reloaded.

# Motivation
Some customers regularly rotate their api key in a secret. We need to
provide a way for them to update our cached key.
https://github.com/DataDog/datadog-lambda-extension/issues/834

# Testing
## Steps
1. Set the env var `DD_API_KEY_SECRET_RELOAD_INTERVAL` to `120`

2. Invoke the Lambda every minute

## Result
The reload interval is passed to the `ApiKeyFactory`
<img width="711" height="25" alt="image"
src="https://github.com/user-attachments/assets/6fcc5081-accb-4928-8fa7-094d36aa2fa1"
/>

Reload happens roughly every 120 seconds. It's sometimes longer than 120
seconds due to the reason explained above.
<img width="554" height="252" alt="image"
src="https://github.com/user-attachments/assets/3fa78249-ff98-47d2-a953-f090630bbeb1"
/>

# Notes to Users
When you use this env var, please also keep a grace period for the old
api key after you update the secret to the new key, and make the grace
period longer than the reload interval to give the extension sufficient
time to reload the secret.

# Internal Notes
Jira: https://datadoghq.atlassian.net/browse/SVLS-7572

* [SVLS-7885] update tag splitting to allow for ',' and ' ' (#916)

## Overview
We currently split the`DD_TAGS` only by `,`. Customer is asking if we
can also split by spaces since that is common for container images and
lambda lets you deploy images.
(https://docs.datadoghq.com/getting_started/tagging/assigning_tags/?tab=noncontainerizedenvironments)

* [SLES-2547] add metric namespace for DogStatsD (#920)

Follow up from https://github.com/DataDog/serverless-components/pull/48

What does this PR do?
Add support for DD_STATSD_METRIC_NAMESPACE.

Motivation
This was brought up by a customer, they noticed issues migrating to
bottlecap. Our docs show we should support this, but we currently don't
have it implemented -
https://docs.datadoghq.com/serverless/guide/agent_configuration/#dogstatsd-custom-metrics.

Additional Notes
Requires changes in agent/extension. Will follow up with those PRs.

Describe how to test/QA your changes
Deployed changes to extension and tested with / without the custom
namespace env variable. Confirmed that metrics are getting the prefix
attached,
[metrics](https://ddserverless.datadoghq.com/metric/explorer?fromUser=false&graph_layout=stacked&start=1762783238873&end=1762784138873&paused=false#N4Ig7glgJg5gpgFxALlAGwIYE8D2BXJVEADxQEYAaELcqyKBAC1pEbghkcLIF8qo4AMwgA7CAgg4RKUAiwAHOChASAtnADOcAE4RNIKtrgBHPJoQaUAbVBGN8qVoD6gnNtUZCKiOq279VKY6epbINiAiGOrKQdpYZAYgUJ4YThr42gDGSsgg6gi6mZaBZnHKGABuMMiZeBoIOKoAdPJYTFJNcMRwtRIdmfgiCMAAVDwgfKCR0bmxWABMickIqel4WTl5iIXFIHPlVcgAVjiMIk3TmvIY2U219Y0tbYwdXT0EkucDeEOj4zwAXSornceEwoXCINUYIwMVK8QmFFAUJhcJ0CwmQJA9SwaByoGueIQCE2UBwMCcmXBGggmUSaFEcCcckUynSDKg9MZTnoTGUIjcHjQiKSEHsmCwzIUmwZIiUgJ4fGx8gZCAAwlJhDAUCIwWgeEA)

* refactor: Move metric namespace validation to dogstatsd util (#921)

https://datadoghq.atlassian.net/browse/SLES-2547

- Updates dependency to use centralized parse_metric_namespace function.
 - Removes duplicate code in favor of the shared implementation.


Test:
- Deploy the extension and config w/
[DD_STATSD_METRIC_NAMESPACE](https://us-east-1.console.aws.amazon.com/lambda/home?region=us-east-1#/functions/ltn-fullinstrument-bn-10bst-node22-lambda?subtab=envVars&tab=configure)
<img width="964" height="290" alt="image"
src="https://github.com/user-attachments/assets/94836a3a-9905-44b4-9565-185745e47981"
/>
- Invoke the function and expect to see the metric using this custom
prefix namespace
<img width="1170" height="516" alt="Screenshot 2025-11-11 at 4 59 57 PM"
src="https://github.com/user-attachments/assets/0bf4ac5e-ac1c-4cfe-817e-89b004717caf"
/>

[Metric
link](https://ddserverless.datadoghq.com/metric/explorer?fromUser=true&graph_layout=stacked&start=1762897808375&end=1762898083375&paused=true#N4Ig7glgJg5gpgFxALlAGwIYE8D2BXJVEADxQEYAaELcqyKBAC1pEbghkcLIF8qo4AMwgA7CAgg4RKUAiwAHOChASAtnADOcAE4RNIKtrgBHPJoQaUAbVBGN8qVoD6gnNtUZCKiOq279VKY6epbINiAiGOrKQdpYZAYgUJ4YThr42gDGSsgg6gi6mZaBZnHKGABuMMhsaGg4YG5oUAB0WmiCLapS4m6iMMAAVDwgPAC6VBpyaDmg8hgzCAg5STgwTpmYGhoQmYloonBOcorK6QdQ+4dO9EzKIm4eaKP8EPaYWMcKKwciSuM8Pggd7iADCUmEMBQIjwdR4QA)

* [SVLS-7704] add support for SSM Parameter API key (#924)

## Overview
* Add support for customers storing Datadog API Key in SSM Parameter
Store.

## Testing
* Deployed changes and confirmed this work with Parameter Store String
and SecureString.

* feat: Add support for DD_LOGS_ENABLED as alias for DD_SERVERLESS_LOGS_ENABLED (#928)

https://datadoghq.atlassian.net/browse/SVLS-7818

  ## Overview
Add DD_LOGS_ENABLED environment variable and YAML config option as an
alias for DD_SERVERLESS_LOGS_ENABLED. Both variables now use OR logic,
meaning logs are enabled if either variable is set to true.

  Changes:
  - Add logs_enabled field to EnvConfig and YamlConfig structs
- Implement OR logic in merge_config functions: logs are enabled if
either DD_LOGS_ENABLED or DD_SERVERLESS_LOGS_ENABLED is true
- Add comprehensive test coverage with 9 test cases covering all
combinations of the two variables
  - Maintain backward compatibility with existing configurations
  - Default value remains true when neither variable is set


## Testing 
Set DD_LOGS_ENABLED and DD_SERVERLESS_LOGS_ENABLED to false and expect:
- [Log can be found in AWS
console](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups/log-group/$252Faws$252Flambda$252Fltn-fullinstrument-bn-cold-node22-lambda/log-events/2025$252F11$252F13$252F$255B$2524LATEST$255D455478dcbc944055b5be933e2e099f6a$3FfilterPattern$3DREPORT+RequestId)
- [Log could NOT be found in DD
console](https://ddserverless.datadoghq.com/logs?query=source%3Alambda%20%40lambda.arn%3A%22arn%3Aaws%3Alambda%3Aus-east-1%3A425362996713%3Afunction%3Altn-fullinstrument-bn-cold-node22-lambda%22%20AND%20%22REPORT%20RequestId%22&agg_m=count&agg_m_source=base&agg_t=count&clustering_pattern_field_path=message&cols=host%2Cservice%2C%40lambda.request_id&fromUser=true&messageDisplay=inline&refresh_mode=paused&storage=hot&stream_sort=desc&viz=stream&from_ts=1763063694206&to_ts=1763065424700&live=false)

Otherwise the log should be available in DD console.

* chore: Upgrade libdatadog and construct http client for traces (#917)

Upgrade libdatadog. Including:
- Rename a few creates:
  - `ddcommon` -> `libdd-common`
  - `datadog-trace-protobuf` -> `libdd-trace-protobuf`
  - `datadog-trace-utils` -> `libdd-trace-utils`
  - `datadog-trace-normalization` -> `libdd-trace-normalization`
  - `datadog-trace-stats` -> `libdd-trace-stats`
- Use the new API to send traces, which takes in an http_client object
instead of proxy url string

GitHub issue:
https://github.com/DataDog/datadog-lambda-extension/issues/860
Jira: https://datadoghq.atlassian.net/browse/SLES-2499
Slack discussion:
https://dd.slack.com/archives/C01TCF143GB/p1762526199549409

* Merge Lambda Managed Instance feature branch (#947)

https://datadoghq.atlassian.net/browse/SVLS-8080

## Overview
Merge Lambda Managed Instance feature branch

## Testing 
Covered by individual commits

Co-authored-by: shreyamalpani <shreya.malpani@datadoghq.com>
Co-authored-by: duncanista <30836115+duncanista@users.noreply.github.com>
Co-authored-by: astuyve <aj.stuyvenberg@datadoghq.com>
Co-authored-by: jchrostek-dd <john.chrostek@datadoghq.com>
Co-authored-by: tianning.li <tianning.li@datadoghq.com>

* fix(config): support colons in tag values (URLs, etc.) (#953)

https://datadoghq.atlassian.net/browse/SVLS-8095

## Overview
Tag parsing previously used split(':') which broke values containing colons like URLs (git.repository_url:https://...). Changed to usesplitn(2, ':') to split only on the first colon, preserving the rest as the value.

Changes:
 - Add parse_key_value_tag() helper to centralize parsing logic
 - Refactor deserialize_key_value_pairs to use helper
 - Refactor deserialize_key_value_pair_array_to_hashmap to use helper
 - Add comprehensive test coverage for URL values and edge cases

## Testing 
unit test and expect e2e tests to pass

Co-authored-by: tianning.li <tianning.li@datadoghq.com>

* [SVLS-7934] feat: Support TLS certificate for trace/stats flusher (#961)

## Problem
A customer reported that their Lambda is behind a proxy, and the
Rust-based extension can't send traces to Datadog via the proxy, while
the previous go-based extension worked.

## This PR
Supports the env var `DD_TLS_CERT_FILE`: The path to a file of
concatenated CA certificates in PEM format.
Example: `DD_TLS_CERT_FILE=/opt/ca-cert.pem`, so the when the extension
flushes traces/stats to Datadog, the HTTP client created can load and
use this cert, and connect the proxy properly.

## Testing
### Steps
1. Create a Lambda in a VPC with an NGINX proxy.
2. Add a layer to the Lambda, which includes the CA certificate
`ca-cert.pem`
3. Set env vars:
    - `DD_TLS_CERT_FILE=/opt/ca-cert.pem`
- `DD_PROXY_HTTPS=http://10.0.0.30:3128`, where `10.0.0.30` is the
private IP of the proxy EC2 instance
    - `DD_LOG_LEVEL=debug`
4. Update routing rules of security groups so the Lambda can reach
`http://10.0.0.30:3128`
5. Invoke the Lambda
### Result
**Before**
Trace flush failed with error logs:
> DD_EXTENSION | ERROR | Max retries exceeded, returning request error
error=Network error: client error (Connect) attempts=1
DD_EXTENSION | ERROR | TRACES | Request failed: No requests sent

**After**
Trace flush is successful:
> DD_EXTENSION | DEBUG | TRACES | Flushing 1 traces
DD_EXTENSION | DEBUG | TRACES | Added root certificate from
/opt/ca-cert.pem
DD_EXTENSION | DEBUG | TRACES | Proxy connector created with proxy:
Some("http://10.0.0.30:3128")
DD_EXTENSION | DEBUG | Sending with retry
url=https://trace.agent.datadoghq.com/api/v0.2/traces payload_size=1120
max_retries=1
DD_EXTENSION | DEBUG | Received response status=202 Accepted attempt=1
DD_EXTENSION | DEBUG | Request succeeded status=202 Accepted attempts=1
DD_EXTENSION | DEBUG | TRACES | Flushing took 1609 ms

## Notes
This fix only covers trace flusher and stats flusher, which use
`ServerlessTraceFlusher::get_http_client()` to create the HTTP client.
It doesn't cover logs flusher and proxy flusher, which use a different
function (http.rs:get_client()) to create the HTTP client. However, logs
flushing was successful in my tests, even if no certificate was added.
We can come back to logs/proxy flusher if someone reports an error.

* chore: Upgrade libdatadog (#964)

## Overview
The crate `datadog-trace-obfuscation` has been renamed as
`libdd-trace-obfuscation`. This PR updates this dependency.

## Testing

* [SVLS-8211] feat: Add timeout for requests to span_dedup_service (#986)

## Problem
Span dedup service sometimes fails to return the result and thus logs
the error:
> DD_EXTENSION | ERROR | Failed to send check_and_add response: true

I see this error in our Self Monitoring and a customer's account.
Also I believe it causes extension to fail to receive traces from the
tracer, causing missing traces. This is because the caller of span dedup
is in `process_traces()`, which is the function that handles the
tracer's HTTP request to send traces. If this function fails to get span
dedup result and gets stuck, the HTTP request will time out.

## This PR
While I don't yet know what causes the error, this PR adds a patch to
mitigate the impact:
1. Change log level from `error` to `warn`
2. Add a timeout of 5 seconds to the span dedup check, so that if the
caller doesn't get an answer soon, it defaults to treating the trace as
not a duplicate, which is the most common case.

## Testing
To merge this PR then check log in self monitoring, as it's hard to run
high-volume tests in self monitoring from a non-main branch.

* [SVLS-8150] fix(config): ensure logs intake URL is correctly prefixed (#1021)

## Overview

Ensures `DD_LOGS_CONFIG_LOGS_DD_URL` is correctly prefixed with
`https://`

## Testing 

Manually tested that logs get sent to alternate logs intake

* chore(deps): upgrade dogstatsd (#1020)

## Overview

Continuation of #1018 removing unnecessary mut lock on callers for
dogstatsd

* chore(deps): upgrade rust to `v1.93.1` (#1034)

## What?

Upgrade rust to latest stable 1.93.1

## Why?

`time` vulnerability fix is only available on rust >= 1.88.0

* feat(http): allow skip ssl validation (#1064)

## Overview

Add DD_SKIP_SSL_VALIDATION support, parsed from both env and YAML,
matching the datadog-agent's behavior — applied to all outgoing HTTP
clients (reqwest via danger_accept_invalid_certs, hyper via custom
  ServerCertVerifier).

## Motivation

Customers in environments with corporate proxies or custom CA setups
need the ability to disable TLS certificate validation, matching the
existing datadog-agent config option. The Go agent applies
tls.Config{InsecureSkipVerify: true} to all HTTP transports via a
central CreateHTTPTransport() — we mirror this by wiring the config
through to both client builders.

And [SLES-2710](https://datadoghq.atlassian.net/browse/SLES-2710)

## Changes

  Config (config/mod.rs, config/env.rs, config/yaml.rs):
- Add skip_ssl_validation: bool to Config, EnvConfig, and YamlConfig
with default false

  reqwest client (http.rs):
- .danger_accept_invalid_certs(config.skip_ssl_validation) on the shared
client builder

  hyper client (traces/http_client.rs):
- Custom NoVerifier implementing
rustls::client::danger::ServerCertVerifier that accepts all certificates
- Uses CryptoProvider::get_default() (not hardcoded aws_lc_rs) for
FIPS-safe signature scheme reporting
  - New skip_ssl_validation parameter on create_client()

## Testing 

Unit tests and self monitoring

[SLES-2710]:
https://datadoghq.atlassian.net/browse/SLES-2710?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

* add Cargo.toml for datadog-agent-config

* update licenses

* remove aws.rs from datadog-agent-config

* chore: upgrade workspace rust edition to 2024 (#96)

* upgrade rust edition to 2024 for workspace

* apply formatting

---------

Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com>
Co-authored-by: alexgallotta <5581237+alexgallotta@users.noreply.github.com>
Co-authored-by: AJ Stuyvenberg <astuyve@gmail.com>
Co-authored-by: Nicholas Hulston <nicholashulston@gmail.com>
Co-authored-by: Aleksandr Pasechnik <aleksandr.pasechnik@datadoghq.com>
Co-authored-by: shreyamalpani <shreya.malpani@datadoghq.com>
Co-authored-by: Yiming Luo <yiming.luo@datadoghq.com>
Co-authored-by: Florentin Labelle <florentin.labelle@outlook.fr>
Co-authored-by: Romain Marcadier <romain.muller@telecomnancy.net>
Co-authored-by: Zarir Hamza <zarir.hamza@datadoghq.com>
Co-authored-by: Romain Marcadier <romain.marcadier@datadoghq.com>
Co-authored-by: Tianning Li <tianning.li@datadoghq.com>
Co-authored-by: jchrostek-dd <john.chrostek@datadoghq.com>
Co-authored-by: astuyve <aj.stuyvenberg@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants