Skip to content

Fix timeout evaluations getting positive fitness#446

Open
bud333 wants to merge 1 commit intoalgorithmicsuperintelligence:mainfrom
bud333:fix-timeout-zero-score
Open

Fix timeout evaluations getting positive fitness#446
bud333 wants to merge 1 commit intoalgorithmicsuperintelligence:mainfrom
bud333:fix-timeout-zero-score

Conversation

@bud333
Copy link

@bud333 bud333 commented Mar 23, 2026

Summary

Fix timeout evaluation paths so timed-out programs cannot receive positive fitness.

In the current behavior, timeout results may omit combined_score, and the fallback fitness logic can incorrectly treat boolean flags like timeout=True as numeric values. This allows failed or timed-out programs to receive a positive fitness score.

For example, a timeout result like {"error": 0.0, "timeout": True} could previously receive a positive fallback fitness because True was treated as 1.0. This change makes timeout paths return combined_score=0.0 explicitly and excludes boolean flags from fallback fitness aggregation.

Changes

  • return combined_score=0.0 for direct evaluation timeouts
  • return combined_score=0.0 for cascade stage timeouts
  • set a meaningful string error message for timeout cases
  • exclude boolean values from numeric aggregation in fitness fallback logic
  • add regression tests for timeout scoring and bool exclusion

Why

This fixes a case where timed-out programs could receive positive fitness when timeout results omitted combined_score and boolean flags were treated as numeric values in fallback
fitness calculation.

Verification

Passed:

  • export OPENAI_API_KEY=test-key-for-unit-tests && python -m unittest tests.test_evaluator_timeout tests.test_metrics_utils
  • export OPENAI_API_KEY=test-key-for-unit-tests && python -m unittest discover tests

@CLAassistant
Copy link

CLAassistant commented Mar 23, 2026

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants