-
Notifications
You must be signed in to change notification settings - Fork 86
Description
I am using navsim==2.2 to evaluate the EPDMS metric on the navtest split. (This evaluation setting, while not the two-stage pseudo-simulation, has been used in several recent papers).
1. Description
I've encountered a consistent issue where certain specific tokens receive a final EPDMS score of 0.0, even though every individual subscore (NC, DAC, DDC, TLC, EP, TTC, etc.) for that token is 1.0.
According to the EPDMS formula (a product of penalty terms and a weighted average of other terms), if all subscores are 1.0, the final EPDMS should also be 1.0. This 0.0 result seems to be an error. Is there anything I might have missed?
2. Other info
Hardware: The issue is reproducible on both NVIDIA H100 and Ascend 910B hardware.
Consistency: The exact same tokens fail (score 0.0) consistently across different models and multiple test runs.
Dependencies : I am aware of potential issues with older numpy versions (like 1.23.4). I have already upgraded numpy to 1.26.4, deleted the entire metric_cache then regenerated it, but the problem persists.