Overhaul saved packets’ serialization format#1517
Conversation
|
Unit tests fail because zonemaster/zonemaster-ldns#254, which is a hard dependency for this PR, is not merged yet. |
|
Also, I’m not quite sure if this change is to be regarded as breaking. Do we consider the cache as being an interface or an implementation detail of Zonemaster-Engine? |
The format is exportable through |
Yes, you are right. I’ve set the appropriate tags. |
Refactor the query() method in Zonemaster::Engine::Nameserver. The query() and _query() methods are far too big. The main problem is that they mix options processing, query packet building, fake delegations and DS responses and cache handling. We therefore start by splitting _query() up in three methods: * _query(), which still does still its thing, but calling out to a few more functions. * _query_cache_key(), which computes a cache key for a DNS query, which lets us replace that with a better solution later. * _make_query_packet(), which creates a Zonemaster::Engine::Packet based on the parameters provided to query(). One consequence of this split-up is that _query() now always sends packets with Zonemaster::LDNS->query_with_pkt(). Previously, _query() called Zonemaster::LDNS->query() for non-EDNS queries and query_with_pkt() for EDNS queries. Thanks to this change, it is now always possible to inspect the packet that Zonemaster::Engine would send on the wire. This eases testing of this module significantly. The t/nameserver.t is also partially rewritten to make use of _make_query_packet(). This lets us test the actual contents of the packet that would be sent out, instead of looking at the global state of Zonemaster::LDNS that happens through some side effect.
Add support in Zonemaster::Engine::Nameserver for writing out the cache into a new CBOR-based format, and add some temporary tooling to aid in migrating the t/*.data files from the current format to this new format. The new format ditches JSON in favor of CBOR. Using CBOR speeds up restoring the saved cache significantly, which is especially noticeable when running the unit test suite. Preliminary experimentation with this format shows that the t/*.data files also shrink, from a total of 73 MB to approximately 48 MB. And the new format is designed with versioning and extensibility in mind. This commit introduces version 1 of the file format. Packets are stored in a format that is entirely controlled by Zonemaster::Engine::Nameserver, which uses simple CBOR types. This helps avoid a situation where the cache file format is too tightly coupled to the Zonemaster::Engine implementation. Cache keys are simply the packet in wire format, instead of an MD5 sum of a string dependent on parameters. This will ensure that we only return a cached result for something that we actually sent before, byte for byte. For the cache key, the ID field is zeroed out and used for internal flags. This change also eliminates a design flaw with the previous cache system: setting the EDNS payload size with the edns_size option (not used in the main code, only in unit tests) did not yield the same cache key as when edns_details.size is used. Now, both methods should map to the same cache key because they would generate an identical packet. In order to aid with migrating the old to the new format without having to rerecord everything, we need to map the old cache key to the new cache key schemes; something we can only do while replaying unit tests. An environment variable, ZONEMASTER_KEY_MAP_FILE, controls whether Zonemaster::Engine::Nameserver logs accesses to the cache, to a file mapping old and new cache keys. The idea is to run the entire test suite to generate key map files for all t/*.data files, then run a bespoke migration script (also introduced in this commit), in order to convert formats.
Use the procedure documented in the first few lines of util/migrate-data-files.pl to migrate the existing data files to the new format. For some reason, the migration didn’t work as intended for two files: t/asn.data and t/Test-dnssec01.t. Fortunately, these two are easy to rerecord. Then, we rename a handful of methods in Zonemaster::Engine::Nameserver, so that running the unit tests now uses the new format.
We’ve migrated cache file formats in a previous commit, so we can remove all the temporary code that was there to support migration from old to new format, and thus complete the job.
Switching from JSON to CBOR for the cache format has made the need for a TO_JSON method in Zonemaster::Engine::DNSName and …::Packet unnecessary. We can now remove these methods.
This script will be much more useful after the format change, because we are switching to a binary format instead of a vaguely text-based one.
Rewrite the axfr() method in Zonemaster::Engine::Nameserver so that it
goes through the cache like query() would.
Implementing this change is a bit more difficult because the API for
AXFR requests in Zonemaster::LDNS is very different from that of
non-AXFR queries.
The basic idea is to use a synthetic AXFR query packet as a cache key,
and to create another synthetic response packet that stores the RRs that
were fetched through the AXFR query (as long as the provided callback
returns 1). That packet is then inserted into the cache. When loading
the cache, the axfr() uses that saved synthetic packet to simulate an
AXFR query.
Interestingly, the t/nameserver-axfr.t test file uses its own data file
format. It actually implements a cache of its own. This is not necessary
anymore and this commit amends the .t file to put it in line with the
others.
There is however one subtlety that I couldn’t fit into the design of the
synthetic answer packets in case the AXFR fails, because it requires a
new feature in Zonemaster::LDNS.
In order to test this commit:
1. Using zonemaster-cli, run the Nameserver03 test on your favorite
domain while saving the packets, e.g.:
$ zonemaster-cli --save mydata --test nameserver03 servfail.fr
2. Examine the cached packets using the data2dig script in the util
subdirectory in Zonemaster-Engine, and look for cached AXFR queries
in the output:
$ ./util/data2dig mydata | grep AXFR
AXFR responses, including errors, are cached as synthetic packets thanks to previous work. But we had no elegant way of storing the associated error message. It felt appropriate to use an extended DNS error code 13 (Cached error) for that purpose, so after implementing the missing feature in Zonemaster::LDNS, we can use it here.
ab21d9e to
e4eb107
Compare
There was a problem hiding this comment.
LGTM, nice work! Aside from one observation, I have just one comment regarding documentation.
Once zonemaster/zonemaster-ldns#254 is merged and conflicts are resolved, I'll approve.
| $res = eval { $self->dns->query_with_pkt( $pkt ) }; | ||
| } | ||
| else { | ||
| $res = eval { $self->dns->query( "$name", $type, $href->{class} ) }; | ||
| } | ||
| my $pkt = $self->_make_query_packet( $name, $type, $href ); | ||
| $res = eval { $self->dns->query_with_pkt( $pkt ) }; |
There was a problem hiding this comment.
One consequence of this split-up is that _query() now always sends
packets with Zonemaster::LDNS->query_with_pkt(). Previously, _query()
called Zonemaster::LDNS->query() for non-EDNS queries and
query_with_pkt() for EDNS queries.
I was wondering if that change is entirely benign. After looking at Zonemaster-LDNS and LDNS code, it appears to be fine. Also the default settings of LDNS objects (resolver and packet) match the ones specified in the "DNS Query And Response Defaults" specification.
There was a problem hiding this comment.
A few new internal methods have been added in this module, but none of them are documented (e.g. _make_query_packet, _query_cache_key, etc...).
Purpose
This PR addresses some long-standing issues with the packet cache in Zonemaster-Engine, its scope, and its serialization format.
The current serialization format in use had some design flaws. It was fairly slow to deserialize due to its use of a combination of plain text lines and JSON. Files were big because of numerous instances of
Zonemaster::Engine::PacketandZonemaster::LDNS::Packetstrings appearing in the JSON data, and the base64 encoding of binary data. The cache’s keys are the result of a one-way computation (in this case, MD5) over query attributes, which makes inspection of files, and migration of formats unnecessarily difficult, not to mention the risk of collisions.In addition to that, not all DNS queries were cached; AXFR queries, used for example in Nameserver03, did not go through it.
CBOR (RFC 8949) proves to be a satisfactory serialization format. It is concise, fast to deserialize, and accepts binary data without base64 encoding.
Cache keys are now the actual DNS packets sent over the wire (with the ID field repurposed for internal flags), making the design simpler and easing migration to other formats if needed.
As for AXFR requests and responses, these are cached as synthetic (i.e. fake) DNS packets. This implementation leverages Extended DNS Errors (EDEs) in the synthetic packets to record AXFR failures accurately.
I managed to migrate most
t/*.datafiles from the old to the new format, but this required a temporary kludge to log the cache keys being accessed, both in the old (hashed) and the new (wireformat) schemes. After running the unit tests once, it was possible to recompute the cache keys in the new scheme, load the data from the old format, and save it in the new format. Three of those data files required manual rerecording, but these were among the easiest data files to regenerate. The kludges in question are all removed in a follow-up commit.The only remaining issue left unaddressed in #938 is the caching of ASN queries when done through the RIPE method (i.e. Whois on port 43/tcp). Currently, these are still not recorded in the cache. When querying ASN numbers through the Cymru method however, the response is cached because this method uses DNS. Addressing this specific issue requires a more major overhaul of Zonemaster-Engine’s caching system, but this PR lays the groundwork for fixing it.
On my development setup, comparing the combined sizes of all
t/*.datafiles and the time spent runningmake test(across 10 runs, using hyperfine), the figures are as follows:Which is a 36% improvement in test suite run time and 34% improvement in data file size.
Context
Fixes #938, with a small caveat.
Changes
Zonemaster::Engine::Nameserverclass;Each change is discussed in more thorough detail in the corresponding commit message.
How to test this PR
Unit tests should still pass.