Skip to content

feat(config): Separate NXDOMAIN DNS cache#5750

Closed
aminvakil wants to merge 11 commits intogetsentry:masterfrom
aminvakil:nxdomain_caching
Closed

feat(config): Separate NXDOMAIN DNS cache#5750
aminvakil wants to merge 11 commits intogetsentry:masterfrom
aminvakil:nxdomain_caching

Conversation

@aminvakil
Copy link

Follow up on #5700 .

Fixes getsentry/sentry#109002 .

getsentry/self-hosted#4213 needs to be changed to add dns_cache_nxdomain: false after this gets merged IMHO.

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. and is gonna need some rights from me in order to utilize my contributions in this here PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

@aminvakil aminvakil requested a review from a team as a code owner March 19, 2026 16:08
@aminvakil aminvakil marked this pull request as draft March 19, 2026 16:09
@aminvakil
Copy link
Author

I have zero rust knowledge and all of this is by Claude Sonnet 4.6 .

Therefore I have converted it to draft, so I can build it locally and see if it really does what it says it does.

@aminvakil
Copy link
Author

cc @aldy505

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

@aminvakil
Copy link
Author

This is working perfectly as expected.

Here is my complete build / test procedure, please tell me if I was wrong in a step or I should've try something else.

I upgraded and fixed https://aur.archlinux.org/packages/sentry-relay and created https://aur.archlinux.org/packages/sentry-relay-git (using my branch to build) and you can see commands which I used to build sentry-relay in it.

  export RUSTFLAGS="$RUSTFLAGS --cfg tokio_unstable"
  cargo fetch --locked --target "$(rustc --print host-tuple)"
  cargo build --frozen --release

You shouldn't use make build or automatic scripts from repository to build package, I could use make build for this specific PR, but why not make it a package? BTW I use arch :)

Anyway I created configuration using this command:

sentry-relay config init
$ cat .relay/config.yml 
# Please see the relevant documentation.
# Performance tuning: https://docs.sentry.io/product/relay/operating-guidelines/
# All config options: https://docs.sentry.io/product/relay/options/
relay:
  mode: managed
  instance: default
  upstream: https://sentry.io/
  host: 127.0.0.1
  port: 3000
  internal_host: null
  internal_port: null

http:
  dns_cache: true
  dns_cache_nxdomain: false

Then I've setup a DNS server to see relay requests to sentry.io and log them in initial registration flow.

Then I tried enabling / disabling dns_cache and dns_cache_nxdomain and changing my DNS server to return NXDOMAIN for sentry.io and it worked perfectly as expected, it didn't query DNS server again by default, and it would not if dns_cache_nxdomain was set to true. It tried every 5 seconds when dns_cache_nxdomain set to false though.

Perfect test would be to have relay query two domains, returning NXDOMAIN for one, and IP for another one, so I can confirm if relay tries finding IP of the domain which is getting NXDOMAIN from DNS server.

Also I don't know if sentry.io is a custom domain in relay or what, but it seems it's using dns_cache and dns_cache_nxdomain directives correctly.

@aminvakil aminvakil marked this pull request as ready for review March 19, 2026 21:42

impl Resolve for HickoryResolver {
fn resolve(&self, name: Name) -> Resolving {
let resolver = self.0.clone(); // cheap, TokioResolver is Arc-backed
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The error handling for read_system_conf() uses .unwrap_or_default(), which falls back to the default resolver, contradicting the explicit goal of avoiding it.
Severity: MEDIUM

Suggested Fix

Instead of using .unwrap_or_default(), handle the potential error from read_system_conf() explicitly. Either log an error when it fails, propagate the error up the call stack, or use a fallback strategy that aligns with the intent to use system resolvers.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: relay-server/src/services/upstream.rs#L856

Potential issue: The code uses `read_system_conf().unwrap_or_default()`. If
`read_system_conf()` fails to read the system's DNS configuration, for instance in
certain containerized or isolated environments, the code will silently fall back to
`ResolverConfig::default()`. This behavior contradicts the stated intent in the commit
history, which was to explicitly avoid using the default resolver. This can lead to the
Relay silently using public DNS servers instead of the intended system resolvers,
causing DNS resolution issues that are difficult to debug.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I know, it's not good, but this is the default behvaiour of reqwest and it was doing the same thing, so I thought this should not get changed in this PR.

@dav1d
Copy link

dav1d commented Mar 19, 2026

That's a lot of config/code for environments that don't need a DNS cache in the first place. I understand that this may solve the problem from the view of self hosted having problems with the cache, but why even cache at all for environments like docker(-compose)? Assuming this actually solves the users problem (I think there wasn't a confirmation yet).

The DNS cache was put in place to solve a very specific problem in Kubernetes, which even that is no longer there for us (due to node local DNS caches).

None of the other Sentry pods give this amount of flexibility and seemingly it's also not necessary for them. This is also something, if we offered it, which needs to be ported to other services like Symbolicator.

@aminvakil
Copy link
Author

That's a lot of config/code for environments that don't need a DNS cache in the first place. I understand that this may solve the problem from the view of self hosted having problems with the cache, but why even cache at all for environments like docker(-compose)? Assuming this actually solves the users problem (I think there wasn't a confirmation yet).

The DNS cache was put in place to solve a very specific problem in Kubernetes, which even that is no longer there for us (due to node local DNS caches).

None of the other Sentry pods give this amount of flexibility and seemingly it's also not necessary for them. This is also something, if we offered it, which needs to be ported to other services like Symbolicator.

I agree with each and every sentence of your comment, I couldn't put it better myself :)

This is too much change for something we do not know if it actually fixes the problem.

When I started to work on it, I didn't know how much change is going to be needed and this is definitely too much change, but I continued completing it to see what others think.

That being said I'm closing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Relay HealthCheck failed

2 participants