feat: Add replicator DNS override support for outbound requests.#5983
feat: Add replicator DNS override support for outbound requests.#5983willholley wants to merge 7 commits intomainfrom
Conversation
6d51f89 to
13796f6
Compare
nickva
left a comment
There was a problem hiding this comment.
This is very nice! I didn't get to play with it locally just did a quick look-over with some comments first
| Body = get_value(body, Params, []), | ||
|
|
||
| % Apply DNS override using connect_to ibrowse option | ||
| #url{host = Host, protocol = Protocol} = ibrowse_lib:parse_url(Url), |
There was a problem hiding this comment.
We'd be parsing the url on every request, I wonder if it would work to cache in the httpdb record the host already parsed? Or maybe we could cache the override ssl / connect_to settings in #httpd{}...
There was a problem hiding this comment.
I did wonder about the performance overhead but it seems negligible in the grand scheme of things? If you think it beneficial, I can explore it (may need guidance though!).
There was a problem hiding this comment.
I looked into this and it seems like the complexity wasn't really worthwhile. The persistent term config cache gives a decent speedup.
| end. | ||
|
|
||
| -spec get_overrides() -> [dns_override()]. | ||
| get_overrides() -> |
There was a problem hiding this comment.
It would be nice to have to reparse everything on each resolve. We could use a persistent_term perhaps, but it would add a more code here...
There was a problem hiding this comment.
I pushed a commit which adds a cache using a persistent_term.
| resolve_host/1, | ||
| parse_config/1, | ||
| match_pattern/2, | ||
| get_overrides/0 |
There was a problem hiding this comment.
This is exported for testing mostly? I don't think it's used otherwise, if so we could skip exporting
da53e78 to
503825c
Compare
| Opts = | ||
| case {Proto, OriginalHost} of | ||
| {https, OrigHost} when is_list(OrigHost) -> | ||
| case inet:is_ip_address(OrigHost) of |
There was a problem hiding this comment.
is_ip_address/1 takes an already parsed tuple only
We could use inet:parse_address/1
> inet:parse_address(string:trim("[::0]", both, "[]")).
{ok,{0,0,0,0,0,0,0,0}}
> inet:parse_address("[::0]").
{error,einval}
> inet:parse_address("::0").
{ok,{0,0,0,0,0,0,0,0}}
> inet:parse_address("127").
{ok,{0,0,0,127}}
> inet:parse_address("1.2.3.4").
{ok,{1,2,3,4}}
>inet:parse_address("a.c.d.com").
{error,einval}
Also ibrowse_lib:parse_url{} also return a host type
> ibrowse_lib:parse_url("http://127.0.0.1").
#url{abspath = "http://127.0.0.1",host = "127.0.0.1",
port = 80,username = undefined,password = undefined,
path = "/",protocol = http,host_type = ipv4_address}
> ibrowse_lib:parse_url("http://[::0]").
#url{abspath = "http://[::0]",host = "::0",port = 80,
username = undefined,password = undefined,path = "/",
protocol = http,host_type = ipv6_address}
> ibrowse_lib:parse_url("http://foo.bar.baz.com").
#url{abspath = "http://foo.bar.baz.com",
host = "foo.bar.baz.com",port = 80,username = undefined,
password = undefined,path = "/",protocol = http,
host_type = hostname}
>
| case binary:split(Entry, <<":">>) of | ||
| [Pattern0, Target0] -> | ||
| Pattern = string:trim(Pattern0), | ||
| Target = string:trim(Target0), |
There was a problem hiding this comment.
If the IPv6 is passed in with brackets we should see if ibrowse knows how to connect to a bracketed address. It may have to be stripped of brackets and/or also parsed into an ipv6 address tuple
| {<<"[", _/binary>>, _} -> | ||
| invalid_entry_reason(Entry, "IPv6 addresses cannot be used as patterns"); | ||
| _ -> | ||
| {true, {Pattern, Target}} |
There was a problem hiding this comment.
Would *example:127.0.0.1 work or *:127.0.0.1 work?
| Url = full_url(HttpDb, Params), | ||
| Body = get_value(body, Params, []), | ||
|
|
||
| % Apply DNS override using connect_to ibrowse option |
There was a problem hiding this comment.
Some of the "apply dns override" seems to be similar with that we do in auth_session? Wonder if a helper function here work and call that from auth_session, or some common utility library. If they diverge enough, it may not work though.
This adds a feature to the CouchDB replicator to override the DNS target for specific host patterns (including wildcards) when making outbound requests. The use case is when requests need to be routed via a transparent SNI proxy e.g. for network egress monitoring and specifying overrides in /etc/hosts or similar isn't suffient / possible (e.g. due to lack of wildcard support). There is adds a new configuration option to specify the overrides: ``` [replicator] dns_overrides = host:target, host2:target ``` The replicator resolves the configured host patterns to the alternative connection targets while preserving the request URL host (applies to regular requests and session-auth requests).
- Use inet:is_ip_address to detect whether the original target is an IP address. If it is, do not add the SNI header, since this is only valid for hostnames. - Remove unicode support. - Add support for IPv6 targets. This is a little awkward because IPv6 addresses use the same `:` delimiter as our config, so reqiure them to be bracketed. - Clarify documentation / default.ini examples around wildcard support.
Use inet:parse_address/1 to detect valid IP addresses.
Extracts a helper function `couch_replicator_dns:apply_dns_override/2` which applies the `connect_to` ibrowse option and optional SNI header to replication requests in both `couch_replicator_auth_session` and `couch_replicator_httpc`.
Cached the parsed dns_overrides configuration. I did consider adding per-connection caching as well, but this seemed to be of limited benefit in microbenchmarks. Adding the persistent_term cache yields approx a 3x performance improvement in resolution time (amounts to 150ms over 10k resolutions on my machine).
Overview
This adds a feature to the CouchDB replicator to override the DNS target for specific host
patterns (including wildcards) when making outbound requests. The use case is when requests need to be routed via a transparent SNI proxy e.g. for network egress monitoring. Common approaches to enable host overrides e.g. modifying /etc/hosts are not sufficient because they do not support wildcard routing - this feature avoids having to run a local DNS server such as CoreDNS to direct traffic through the proxy.
There is a new configuration option to specify the overrides:
The replicator resolves the configured host patterns to the alternative connection targets while
preserving the request URL host (applies to regular requests and session-auth requests).
Note this depends on the
connect_tooption in ibrowse, which is a custom feature in the CouchDB ibrowse fork.Testing recommendations
Testing this is with TLS a bit involved as it relies on setting up an SNI proxy. I did it using
nginxin docker with the configuration attached to proxy to a cloudant.com database. The proxy was running on a non-standard port (e.g. 8443) so that any replications connecting directly to cloudant.com would fail.I then set
dns_overrides = *.cloudant.com:127.0.0.1in default.ini and configured a replication frommyaccount.cloudant.com:8443/mydb. The test succeeds if the proxy logged the connection and the replication completed.The feature also works without TLS - you can just use it to direct an arbitrary hostname to your local couchdb, for instance. e.g. if you use
dns_overrides = *.cloudant.com:127.0.0.1and couchdb is running on127.0.0.1:15984, set up a replication with source or target ashttp://foo.cloudant.com:15984/db1and it will be redirected to127.0.0.1:15984/db1.nginx.conf.zip
Related Issues or Pull Requests
Checklist
rel/overlay/etc/default.inisrc/docsfolder