-
Notifications
You must be signed in to change notification settings - Fork 76
Add unset field semantics for structs ("strict structs") #607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Simon Beyzerov <simon.beyzerov@point72.com>
…ance bug Signed-off-by: Adam Glustein <adam.glustein@point72.com>
Signed-off-by: Adam Glustein <adam.glustein@point72.com>
Signed-off-by: Adam Glustein <adam.glustein@point72.com>
|
Previously, when all fields on structs were optional, we hard-coded these two flags for pydantic validation
Can we only apply this logic when |
I tested out this approach and it doesn't validate correctly in the case of a required field with a default value. class MyStrictStruct(csp.Struct, allow_unset=False):
req_default_str: str = "default" |
Signed-off-by: Adam Glustein <adam.glustein@point72.com>
| template<typename T> | ||
| bool InputAdapter::consumeTick( const T & value ) | ||
| { | ||
| if constexpr( CspType::Type::fromCType<T>::type == CspType::TypeTraits::STRUCT ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a central place we can do this check on push rather than consume? Will be much harder to track down on consumption
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not see such a place - we could add it to each adapter type (push, pull, pushpull and managers) separately in their pushTick impls, but I think it's cleaner to have the validation logic in one spot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but my point stands that it would be a lot harder to tell where the error is coming from
Also unfortunate to vall validate on all input adapters, even python based ones which were already validated due to PyStruct creation in python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As to the second point here, we could store a "dirty" flag on the struct to avoid validating it twice if it has a) been validated already and b) not had any modifications since the last validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And actually, since you can't delete fields on a strict struct, you only ever need to validate it once so point b) is irrelevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, so we could store a bit to indicate its already validated, but im not sure its necessary at this point since we almost always be validating once up front
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really we can skip the validation check in InputAdapter::consume for any PyStruct.
In a somewhat hacky way we already have this info as we could check struct -> dialectPtr() != nullptr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are concerned about the double validation I would go with adding a validated bit, but maybe in a separate PR
I think this just means that if there is a default value, we leave it as "optional" (as the user doesn't have to provide it), but if there isn't, then it's not. I guess I was wrong about pydantic inferring it. However, the behavior for non-defaulted fields will be different for the two types of Struct. |
…ield is set to None in the struct's bitmask Signed-off-by: Adam Glustein <adam.glustein@point72.com>
b86157d to
c343b5a
Compare
Signed-off-by: Adam Glustein <adam.glustein@point72.com>
Signed-off-by: Adam Glustein <adam.glustein@point72.com>
| PushBatch batch( m_engine -> rootEngine() ); | ||
| m_inputAdapter -> processMessage( c, t, &batch ); | ||
| } | ||
| catch( csp::Exception & err ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this try/catch here now if we are throwing in consume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is doing anything, right. Reverted the change here
cpp/csp/engine/InputAdapter.h
Outdated
| if constexpr( CspType::Type::fromCType<T>::type == CspType::TypeTraits::STRUCT ) | ||
| { | ||
| if( unlikely( !( value -> validate() ) ) ) | ||
| CSP_THROW( ValueError, "Struct " << value -> meta() -> name() << " is not valid; required fields " << value -> formatAllUnsetStrictFields() << " were not set on init" ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're likely leaving this in consume, can we add something i meant to add.
Please add a virtual const char * name() const; method on InputAdapter, can default it to return "InputAdapter"
we should really implement name where we can, and use name() in the exception here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for purposes of this PR we can just define name() as "InputAdapter" and separately we can implement it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done and added to the error message
cpp/csp/engine/Struct.cpp
Outdated
| m_partialSize = m_size - baseSize; | ||
| m_maskLoc = m_size - m_maskSize; | ||
|
|
||
| uint8_t numRemainingBits = ( m_fields.size() - m_firstPartialField + optionalFieldCount ) % 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like m_firstPartialField is always 0 at this point.
In any case I dont think its needed here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed and removed
cpp/csp/engine/Struct.cpp
Outdated
| for( auto & f : m_fields ) | ||
|
|
||
| // Set optional fields first so that their 2-bits never cross a byte boundary | ||
| m_optionalFieldsSetBits = new uint8_t[ m_maskSize ](); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::unique_ptr or just use std::vector
also consider merging into one allocation for both masks to avoid potential fragmentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to use vectors. I could store it as one vector containing both the set bitmasks and the none bitmasks, but I think the added complexity with that approach is not really worth it, considering this is a member of the meta.
cpp/csp/engine/Struct.cpp
Outdated
| } | ||
| } | ||
| // deal with last (partial) byte filled with optional fields | ||
| auto lastOptIndex = maskLoc - m_maskLoc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i forget if we have a way to run coverage right now but would be good to ensure all of these code paths are being tested or at least executed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simplified this logic a lot to just set the bits programmatically rather than using these precalculated constants, not sure why I didn't do that to start with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is a member of the meta so space isnt a concern, its more the frequent access of these vectors.
if the allocations did happen to fragment it would be unfortunate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see what you mean, if those vectors get allocated on separate cachelines then each validate call will be loading in both.
I am fine to make it a single vector and just use offset logic to access each mask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it so that all the set/none masks are in the same vector.
| if( field -> isNone( struct_ptr ) ) | ||
| py_obj = PyObjectPtr::incref( Py_None ); | ||
| else | ||
| py_obj = switchCspType( field -> type(), [field, struct_ptr, this]( auto tag ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we do keep None processing, brackets around multi-line clause
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
| for( const auto & field: fields ) | ||
| { | ||
| if( !field -> isSet( struct_ptr ) ) | ||
| if( !field -> isSet( struct_ptr ) && !field -> isNone( struct_ptr ) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same Q, do we want to propagate None here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see what else we would do, just treat it as unset? Makes more sense to pass it through as a null for consistency.
|
|
||
| - **Default:** `False` (for backwards compatibility) | ||
| - **When set to `True`:** Enforces *strict struct semantics* — required fields must be set at initialization. | ||
| - Note that `del struct.field` is not allowed for strict structs — this will raise an error. As a result, for any strict struct, `hasattr(struct, field)` always returns True for any defined field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move these two bullets below to the semantics section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
|
||
| - **Default:** `False` (for backwards compatibility) | ||
| - **When set to `True`:** Enforces *strict struct semantics* — required fields must be set at initialization. | ||
| - Note that `del struct.field` is not allowed for strict structs — this will raise an error. As a result, for any strict struct, `hasattr(struct, field)` always returns True for any defined field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for any strict struct,
hasattr(struct, field)always returns True for any defined field.
" and never has to be checked"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
| 1. **Example** | ||
|
|
||
| ```python | ||
| class MyStruct(csp.Struct, allow_unset=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix param name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Signed-off-by: Adam Glustein <adam.glustein@point72.com>
Signed-off-by: Adam Glustein <adam.glustein@point72.com>
Signed-off-by: Adam Glustein <adam.glustein@point72.com>
Cleanup and a few fixes to #574, thanks @sim15 for the contribution!