fix: skip schema lookup when value is not a str#401
Open
piste-jp wants to merge 1 commit into
Open
Conversation
Since 8467490 (capnproto#351, "added binary support in dictionaries via base64 encoding"), `_DynamicStructBuilder.from_dict` does a `self.schema.fields.get(key)` lookup for every key in the input dict. The lookup was added so that `str` values destined for a `Data` field can be base64-decoded, but the lookup itself runs unconditionally, even when the value is `bool`, `int`, `dict`, `list`, `bytes`, etc.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_DynamicStructBuilder.from_dictperforms aself.schema.fields.get(key)lookup for every key in the input dict, regardless of value type. This lookup was introduced in #351 to support base64-decoding ofstrvalues destined forDatafields, but it runs unconditionally.Moving the lookup inside the
isinstance(val, str)branch makes it skipped for the common case (bool,int,dict,list,bytes, etc.) while preserving the decode behavior bit-for-bit forstrvalues.Why this matters
In real-time systems that construct many messages per second via
SomeStruct.new_message(**kwargs), this lookup lands on the hot path:For our use case, even a kwargs dict as small as
{'valid': False, 'logMonoTime': <int>}pays for two schema lookups pernew_message()call. Neither value is astr, so the decode branch never fires — but the lookup runs anyway.Measurement
Measured with
py-spyon aarch64, 30s sampling at 100Hz, on two processes that callnew_messageheavily:new_messageself timeTotal CPU time for Process B over the 30s window:
Restoring the v2.1.0 hot-path behavior brought the regressed publish frequency (100Hz → 98.4Hz) back to the expected rate, and dropped per-process CPU usage to the level seen before #351.
Visual comparison (py-spy "Left Heavy" view, process B)
v2.2.0 (current):

new_message(the wide purple block) dominates the publisher'smain loop, taking roughly half of the total CPU time.
v2.2.0 + this patch:

new_messageis no longer visible as a hot frame. The publisherloop is back to spending its time on actual work (
writer,send,send_frame,compress_sync).Compatibility
No behavior change for
strvalues: thefield.proto.slot.type.which() == "data"check andbase64.b64decode(val)call are unchanged, just nested one level deeper. The existingtest_blob_to_dict_base64.pystill passes.Diff
def from_dict(self, dict d): for key, val in d.iteritems(): if key != 'which': - field = self.schema.fields.get(key) if isinstance(val, str): + field = self.schema.fields.get(key) dtype = field.proto.slot.type.which() if dtype == "data": # decode bytes from utf-8 base64 encoding val = base64.b64decode(val) try: self._set(key, val)