Skip to content

Fix search_dates language fallback#1331

Open
Kill1ngPig wants to merge 2 commits into
scrapinghub:masterfrom
Kill1ngPig:codex-fix-search-dates-language-fallback
Open

Fix search_dates language fallback#1331
Kill1ngPig wants to merge 2 commits into
scrapinghub:masterfrom
Kill1ngPig:codex-fix-search-dates-language-fallback

Conversation

@Kill1ngPig
Copy link
Copy Markdown

Summary

Fix search_dates() so it can fall back to other explicitly provided languages when the initially selected language does not find any dates.

Fixes #1326.


Problem

When search_dates() is called with multiple languages and STRICT_PARSING=True, the language selection step may choose a language that cannot parse the date expression. In that case, the current implementation stops after the first selected language and returns no dates, even if another explicitly provided language can parse the same text.

For example, this French date is parsed correctly with languages=["fr"], but was previously dropped when several languages were provided:

"Date de facture 23 juillet 2020 Condition Redevable livraison FR"

Changes

This is not specific to French or to the FR suffix. It handles the broader case where the first selected language fails but another user-provided language can parse the date.


Tests

python -m pytest -p no:cacheprovider tests/test_search.py tests/test_clean_api.py -q

Result:

262 passed

@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.12%. Comparing base (075712f) to head (e329bef).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1331      +/-   ##
==========================================
+ Coverage   97.10%   97.12%   +0.01%     
==========================================
  Files         235      235              
  Lines        2904     2917      +13     
==========================================
+ Hits         2820     2833      +13     
  Misses         84       84              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@AdrianAtZyte
Copy link
Copy Markdown
Contributor

Please, run pre-commit.

@Kill1ngPig
Copy link
Copy Markdown
Author

Thanks for the reminder. I ran pre-commit run --all-files locally and pushed the formatting changes in e329bef.

@AdrianAtZyte AdrianAtZyte requested a review from serhii73 May 12, 2026 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

search_dates() silently drops a French date when STRICT_PARSING=True and multiple languages are passed

2 participants