chore: add datadog-agent-config crate#56
Conversation
* remove `config.rs` file * create `config/mod.rs` * move to `config/flush_strategy.rs` * move to `config/log_level.rs` * update imports * fmt
* add logs processing rules field * add `regex` crate * add `processing_rules.rs` config module * use `processing_rule` module instead * update logs `processor` to use compiled rules * update unit test
* add plumbing for aws secret manager * strip as much deps as possible * fix test * remove unused warning * reorg runner for bottlecap * fix overwriting of arch * add full error to the panic * avoid building the go agent all the time * rename module * speed up build * add simple scripts to build and publish * remove deleted call * remove changes from common scripts * resolve import conflicts * wrong file pushed * make sure permissions are right * move secret parsing after log activation * add some stat to build * add manual req for secret (still broken) * rebuild after conflict on cargo loc * automate update and call * change headers and fix signature * fix typo and small refactor * remove useless thread spawn * small refactors on deploy scripts * use access key always for signatures * the secret has to be used to sign * fix: missing newline in request * use only manual decrypt * add timed steps * add scripts to force restarts * fix launch script * refactor decrypt * cargo format and clippy * fix clippy error add formatting/clippy functinos --------- Co-authored-by: AJ Stuyvenberg <astuyve@gmail.com>
* add kms handling * fix return value * fix test * fix kms * remove committed test file * rename * format * fmt after fix * fix conflicts * await async stuff * formatting * bubble up error converting to sdt * use box dyn for generic errors * reforamt * address other comments * remove old build file added with conflict
* add kms handling * fix return value * fix test * fix kms * remove committed test file * rename * format * fmt after fix * fix conflicts * await async stuff * formatting * bubble up error converting to sdt * use box dyn for generic errors * reforamt * address other comments * remove old build file added with conflict * do not pass around the whole config for just the secret * fix scope and just bubble up erros * reformat * renaming * without api key, just call next loop * fix types and format * fix folder path * fix cd and returns * resolve conflicts * formatter
* print failover reason as json string * fmt * update key to be more verbose
* wip: tracing * feat: tracing WIP * feat: rename mini agent to trace agent * feat: fmt * feat: Fix formatting after rename * fix: remove extra tokio task * feat: allow tracing * feat: working v5 traces * feat: Update to use my branch of libdatadog so we have v5 support * feat: Update w/ libdatadog to pass trace encoding version * feat: update w/ merged libdatadog changes * feat: Refactor trace agent, reduce code duplication, enum for trace version. Pass trace provider. Manual stats flushing. Custom create endpoint until we clean up that code in libdatadog. * feat: Unify config, remove trace config. Tests pass * feat: fmt * feat: fmt * clippy fixes * parse time * feat: clippy again * feat: revert dockerfile * feat: no-default-features * feat: Remove utils, take only what we need * feat: fmt moves the import * feat: replace info with debug. Replace log with tracing lib * feat: more debug * feat: Remove call to trace utils
…for the runtime proxy (#296) * feat: Allow appsec but in a disabled-only state until we add support for the runtime proxy * feat: Log failover reason * fix: serverless_appsec_enabled. Also log the reason
* feat: Require DD_EXTENSION_VERSION: next * feat: add tests, fix metric tests * feat: revert metrics test byte changes * feat: fmt * feat: remove ref
* feat: honor enhanced metrics bool * feat: add test * feat: refactor to log instead of return result * fix: clippy
* fallback on `datadog.yaml` presence * add comment
* remove `tracing-log` instead, use the `tracing-subscriber` `tracing-log` feature * capitalize debugs * remove unnecessary file * update log formatter prefix * update log filter * fmt
* feat: race flush * refactor: periodic only when configured * fmt * when flushing strategy is default, set periodic flush tick to `1s` * on `End`, never flush until the end of the invocation * remove `tokio_unstable` feature for building * remove debug comment * remove `invocation_times` mod * update `flush_control.rs` * use `flush_control` in main * allow `end,<ms>` strategy allows to flush periodically over a given amount of seconds and at the end * update `debug` comment for flushing * simplify logic for flush strategy parsing * remove log that could spam debug * refactor code and add unit test --------- Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com> Co-authored-by: alexgallotta <5581237+alexgallotta@users.noreply.github.com>
* test: add invalid string and multi line distro test with empty newline * test: move unit test to appropriate package * fix: do not error log for empty and new line strings --------- Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com>
* feat: Allow trace disabled plugins * feat: trace debug
* feat: Allowlist additional env vars * fix: fmt * feat: and repo url
* fix: allow objects to be ignored * feat: specs
* set explicit deny list also allow `datadog.yaml` usage * add unit test for parsing rule from yaml * remove `object_ignore.rs` * remove import * remove logging failover reason when user is not opt-in
* failover fast * typo * failover on `/opt/datadog_wrapper` set
* feat: serde's rename_all isn't working, use a custom deserializer to lowercase loglevels * feat: default is warn * feat: Allow reptition to clear up imports * feat: rebase
* feat: support DD_HTTP_PROXY and DD_HTTPS_PROXY * fix: remove import * fix: fmt * feat: Revert fqdn changes to enable testing * feat: Use let instead of repeated instantiation * feat: Rip out proxy stuff we dont need but make sure we dont proxy the telemetry or runtime APIs with system proxies * feat: remove debug * fix: no debugs for hyper/h2 * fix: revert cargo changes * feat: Pin libdatadog deps to v13.1 * fix: rebase with dogstatsd 13.1 * fix: use main for dsdrs * fix: remove unwrap * fix: fmt * fix: licenses * increase size boo * fix: size ugh * fix: install_default() in tests
* feat: Honor priority order of DD_PROXY_HTTPS over HTTPS_PROXY * feat: fmt * fix: Prefer Ok over some + ok * Feat: Use tags for proxy support in libdatadog * fix: no proxy for tests * fix: license * all this for a comma
This reverts commit 9560657582f2f22c8e68af5d0bb9d7d2b0765650.
https://datadoghq.atlassian.net/browse/SVLS-8095 ## Overview Tag parsing previously used split(':') which broke values containing colons like URLs (git.repository_url:https://...). Changed to usesplitn(2, ':') to split only on the first colon, preserving the rest as the value. Changes: - Add parse_key_value_tag() helper to centralize parsing logic - Refactor deserialize_key_value_pairs to use helper - Refactor deserialize_key_value_pair_array_to_hashmap to use helper - Add comprehensive test coverage for URL values and edge cases ## Testing unit test and expect e2e tests to pass Co-authored-by: tianning.li <tianning.li@datadoghq.com>
## Problem
A customer reported that their Lambda is behind a proxy, and the
Rust-based extension can't send traces to Datadog via the proxy, while
the previous go-based extension worked.
## This PR
Supports the env var `DD_TLS_CERT_FILE`: The path to a file of
concatenated CA certificates in PEM format.
Example: `DD_TLS_CERT_FILE=/opt/ca-cert.pem`, so the when the extension
flushes traces/stats to Datadog, the HTTP client created can load and
use this cert, and connect the proxy properly.
## Testing
### Steps
1. Create a Lambda in a VPC with an NGINX proxy.
2. Add a layer to the Lambda, which includes the CA certificate
`ca-cert.pem`
3. Set env vars:
- `DD_TLS_CERT_FILE=/opt/ca-cert.pem`
- `DD_PROXY_HTTPS=http://10.0.0.30:3128`, where `10.0.0.30` is the
private IP of the proxy EC2 instance
- `DD_LOG_LEVEL=debug`
4. Update routing rules of security groups so the Lambda can reach
`http://10.0.0.30:3128`
5. Invoke the Lambda
### Result
**Before**
Trace flush failed with error logs:
> DD_EXTENSION | ERROR | Max retries exceeded, returning request error
error=Network error: client error (Connect) attempts=1
DD_EXTENSION | ERROR | TRACES | Request failed: No requests sent
**After**
Trace flush is successful:
> DD_EXTENSION | DEBUG | TRACES | Flushing 1 traces
DD_EXTENSION | DEBUG | TRACES | Added root certificate from
/opt/ca-cert.pem
DD_EXTENSION | DEBUG | TRACES | Proxy connector created with proxy:
Some("http://10.0.0.30:3128")
DD_EXTENSION | DEBUG | Sending with retry
url=https://trace.agent.datadoghq.com/api/v0.2/traces payload_size=1120
max_retries=1
DD_EXTENSION | DEBUG | Received response status=202 Accepted attempt=1
DD_EXTENSION | DEBUG | Request succeeded status=202 Accepted attempts=1
DD_EXTENSION | DEBUG | TRACES | Flushing took 1609 ms
## Notes
This fix only covers trace flusher and stats flusher, which use
`ServerlessTraceFlusher::get_http_client()` to create the HTTP client.
It doesn't cover logs flusher and proxy flusher, which use a different
function (http.rs:get_client()) to create the HTTP client. However, logs
flushing was successful in my tests, even if no certificate was added.
We can come back to logs/proxy flusher if someone reports an error.
## Overview The crate `datadog-trace-obfuscation` has been renamed as `libdd-trace-obfuscation`. This PR updates this dependency. ## Testing
## Problem Span dedup service sometimes fails to return the result and thus logs the error: > DD_EXTENSION | ERROR | Failed to send check_and_add response: true I see this error in our Self Monitoring and a customer's account. Also I believe it causes extension to fail to receive traces from the tracer, causing missing traces. This is because the caller of span dedup is in `process_traces()`, which is the function that handles the tracer's HTTP request to send traces. If this function fails to get span dedup result and gets stuck, the HTTP request will time out. ## This PR While I don't yet know what causes the error, this PR adds a patch to mitigate the impact: 1. Change log level from `error` to `warn` 2. Add a timeout of 5 seconds to the span dedup check, so that if the caller doesn't get an answer soon, it defaults to treating the trace as not a duplicate, which is the most common case. ## Testing To merge this PR then check log in self monitoring, as it's hard to run high-volume tests in self monitoring from a non-main branch.
… (#1021) ## Overview Ensures `DD_LOGS_CONFIG_LOGS_DD_URL` is correctly prefixed with `https://` ## Testing Manually tested that logs get sent to alternate logs intake
## Overview Continuation of #1018 removing unnecessary mut lock on callers for dogstatsd
## What? Upgrade rust to latest stable 1.93.1 ## Why? `time` vulnerability fix is only available on rust >= 1.88.0
## Overview
Add DD_SKIP_SSL_VALIDATION support, parsed from both env and YAML,
matching the datadog-agent's behavior — applied to all outgoing HTTP
clients (reqwest via danger_accept_invalid_certs, hyper via custom
ServerCertVerifier).
## Motivation
Customers in environments with corporate proxies or custom CA setups
need the ability to disable TLS certificate validation, matching the
existing datadog-agent config option. The Go agent applies
tls.Config{InsecureSkipVerify: true} to all HTTP transports via a
central CreateHTTPTransport() — we mirror this by wiring the config
through to both client builders.
And [SLES-2710](https://datadoghq.atlassian.net/browse/SLES-2710)
## Changes
Config (config/mod.rs, config/env.rs, config/yaml.rs):
- Add skip_ssl_validation: bool to Config, EnvConfig, and YamlConfig
with default false
reqwest client (http.rs):
- .danger_accept_invalid_certs(config.skip_ssl_validation) on the shared
client builder
hyper client (traces/http_client.rs):
- Custom NoVerifier implementing
rustls::client::danger::ServerCertVerifier that accepts all certificates
- Uses CryptoProvider::get_default() (not hardcoded aws_lc_rs) for
FIPS-safe signature scheme reporting
- New skip_ssl_validation parameter on create_client()
## Testing
Unit tests and self monitoring
[SLES-2710]:
https://datadoghq.atlassian.net/browse/SLES-2710?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
2daee5b to
7e1eed6
Compare
f5f5d6f to
ff987e1
Compare
crates/datadog-agent-config/aws.rs
Outdated
There was a problem hiding this comment.
You might not need to migrate this one
There was a problem hiding this comment.
Since this is very Lambda specific and doesn't need to be here, as its mostly used for Lifecycle purposes in the Extension
There was a problem hiding this comment.
Also, you might not need to migrate this one, as it's very Extension specific
There was a problem hiding this comment.
serverless_flush_strategy is already part of the Config struct so I think using a feature to gate this behavior would be a cleaner solution compared to removing it from the Config struct.
We can always do a refactor to remove it from the Config struct and add it to an AWS specific config struct in the future.
There was a problem hiding this comment.
You might want to import this one from dd-trace-rs
There was a problem hiding this comment.
Discussed in person, will handle in a separate PR
* upgrade rust edition to 2024 for workspace * apply formatting
What does this PR do?
configcrate fromdatadog-lambda-extensiontoserverless-componentswith commit history.Cargo.tomlfor crateMotivation
Create a shared config crate that can be used for AWS, Azure, and GCP.
https://datadoghq.atlassian.net/browse/SVLS-5564
Additional Notes
Commands to upstream crate from
datadog-lambda-extensionwith commit history.Planned follow up PRs:
datadog-agent-configcrate for lambda specific functionalitydatadog-agent-configcrate, gating any Azure/GCP specific functionality behind a featuredatadog-serverless-compat,dogstatsdanddatadog-trace-agentcrates.Describe how to test/QA your changes
Automated tests. Not wired into Bottlecap and Serverless Compatibility Layer yet.