Fix precise floats#152
Conversation
Cases are manually verified to be expressible in float using an online tool
non-float representations are encoded with round((val - min)/res) float representations follow the same mathematics, but floats are converted to integer significant and exponent representations for lossless computation. std::bitset is also used to implement "wider" integer types. This is generalised so that the implementation for floats will cover for doubles too (would use 128 bit integers)
|
Thanks for this - I'll take a deeper look soon. This will probably need to be part of Codec 5, but I'll see what I think. For DCCL5 I'd be happy to push the minimum C++ version to 17. |
|
Oh I didn't see your message. I updated it so that it works with C++14 anyway 😂 |
|
Okay I think I did all I can for the CI tests. I'm not really sure why the |
|
Hi, thanks for your work here - it looks good. I have no idea why the This will become part of DCCL 5, which I've started tracking at DCCL 5 |
|
No problem!
Also do we need to move the version of the code as V5 codec or should this be overwriting the previous implementation? |
Yes we will make this the v5 DefaultNumericFieldCodec. I think I'd like to move these support functions out of Also I fixed the visibility of the DCCL5 project. |
ccb9514 to
3aa7de2
Compare
|
Apologies about the force push, I had to update the emails for a few of my commits.
I moved the new functions from this PR to |
…this likely changes the wire format in some cases. Update unit test for v5
Co-authored-by: tsaubergine <732276+tsaubergine@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces a new numerically robust float/double quantization path for DCCL v5 default numeric fields (aimed at eliminating edge-case precision loss near quantization boundaries) and adds a dedicated test suite to validate float precision behavior across multiple precisions.
Changes:
- Added
src/numeric.h/.cppwith IEEE-754 decomposition helpers and bitset-based integer-domain arithmetic for float/double encode/decode. - Replaced the v5
DefaultNumericFieldCodecalias with an explicit v5 implementation that uses the newdccl::encode/decodehelpers. - Added a new
dccl_float_precisionCTest target (proto + test) covering float precision sweeps, boundary cases, and negative precision.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
src/codecs5/field_codec_default.h |
Implements v5 numeric codec using new numeric helpers for improved float/double stability. |
src/numeric.h |
Adds bitset arithmetic + template encode/decode helpers used by v5 numeric codec. |
src/numeric.cpp |
Adds IEEE-754 decomposition/composition implementation backing numeric.h declarations. |
src/common.h |
Adds <bitset> include (supporting new numeric utilities). |
src/CMakeLists.txt |
Adds numeric.cpp to the library build. |
src/test/CMakeLists.txt |
Adds the new float precision test subdirectory. |
src/test/dccl_float_precision/test.proto |
New proto messages for float precision / negative precision / precision sweep testing. |
src/test/dccl_float_precision/test.cpp |
New executable test validating v5 float encoding/decoding behavior and tolerances. |
src/test/dccl_float_precision/CMakeLists.txt |
Builds and registers the new dccl_test_float_precision test. |
AUTHORS |
Adds contributor entry. |
CMakeLists.txt |
Minor whitespace-only change. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…memcpy to read float into uint instead of reinterpret_cast; fix typo with float overload of numeric_limits in double function
|
@cadmus-to Thank you for your hard work here - this is great. I can't convince myself that it won't change the wire format in some edge cases so this will be the default v5 numeric codec in the forthcoming 5.0.0 release (and v2-v4 will stay the same, slightly buggy former implementation). |
Summary
This PR improves the numerical stability and precision of
floatanddoubleencoding/decoding inDefaultNumericFieldCodec, particularly for higher precision decimal fields (e.g. precision ≥ 6) and or values close to quantisation boundaries.The core change is to avoid lossy floating‑point arithmetic during quantisation by performing the core mathematics in an integer domain derived directly from the IEEE‑754 representation of floats. This makes the encoding behave consistently with the intended
round(value/resolution) − round(min/resolution)semantics even at the limits of float precision.An initial attempt at extracting the significand and exponent into standard integer types improved the situation, but still ran out of precision for some edge cases. That led to the current approach, which uses a wider integer representation (
std::bitsetnecessary for 128 bits when working withdoubles) for the significand to perform the core arithmetic exactly.Implementation overview
Encoding changes
The floating types are decomposed into
std::bitsetsignificand and integer exponent based on the IEEE‑754 layout.Additional arithmetic was added to work with the
std::bitset/integer representations including:A
std::bitsetwith twice the width of the native significand is used to safely carry intermediate results. This is implemented forfloats so that we test the same code path as thedoublecase.For non‑floating types, the existing logic is preserved.
Decoding changes
Decoding mirrors the inverse integer arithmetic to reconstruct the value.
Final values are explicitly quantised at the end to ensure consistency with declared resolution.
Tests
This PR adds a dedicated test suite covering:
All existing tests pass, and all new tests pass with the updated implementation.
Additional comments
Encoding changes
Although decode/encode round‑trips preserve values within the expected tolerance, the bit‑level encoding has changed for some float values compared to previous versions. This means encoded payloads for some float fields may differ from earlier DCCL versions.
Should this logic be gated behind a new codec / protocol version (e.g. v5)? Should I leave this for you to handle @tsaubergine ?
Code cleanup / style
Are there helpers or utilities you would prefer refactored, renamed, or relocated?
Are there unused functions that should be removed or consolidated?