-
Notifications
You must be signed in to change notification settings - Fork 6
fix: tdg import db transaction handling + undefined fields #1543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances the Transport Data Gouv (TDG) feed import process by adding field validation, custom exception handling, and nested database transactions for resource-level error isolation. The changes aim to make the import more robust by allowing the process to continue even when individual resources fail validation or encounter database integrity errors.
Key changes:
- Introduced
InvalidTDGFeedErrorexception and validation for required fields (publisher name, producer URL) - Wrapped each resource's processing in a nested database transaction using
begin_nested()for isolation - Added exception handlers to catch and log
InvalidTDGFeedErrorandIntegrityErrorwithout stopping the entire import
functions-python/tasks_executor/src/tasks/data_import/transportdatagouv/import_tdg_feeds.py
Outdated
Show resolved
Hide resolved
functions-python/tasks_executor/src/tasks/data_import/transportdatagouv/import_tdg_feeds.py
Outdated
Show resolved
Hide resolved
functions-python/tasks_executor/src/tasks/data_import/transportdatagouv/import_tdg_feeds.py
Outdated
Show resolved
Hide resolved
functions-python/tasks_executor/src/tasks/data_import/transportdatagouv/import_tdg_feeds.py
Outdated
Show resolved
Hide resolved
functions-python/tasks_executor/src/tasks/data_import/transportdatagouv/import_tdg_feeds.py
Outdated
Show resolved
Hide resolved
functions-python/tasks_executor/src/tasks/data_import/transportdatagouv/import_tdg_feeds.py
Outdated
Show resolved
Hide resolved
davidgamez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Summary:
This PR improves the robustness and error handling of the TDG feed import process by adding explicit validation for required fields, introducing a custom exception, and ensuring that database operations for each resource are safely isolated. The changes also refactor some code for clarity and maintainability.
Error handling and validation:
InvalidTDGFeedErrorto represent missing required fields in TDG datasets/resources._update_common_tdg_fieldsto validate the presence of the publisher name and producer URL, logging a warning and raisingInvalidTDGFeedErrorif they are missing._process_tdg_dataset, added try/except blocks to catchInvalidTDGFeedErrorandIntegrityErrorper resource, logging and skipping invalid or problematic resources without interrupting the entire import process. [1] [2]Database transaction safety:
db_session.begin_nested()), ensuring that errors in one resource do not affect others.Code refactoring and cleanup:
static_feeds_by_dataset_idis updated usingsetdefault. [1] [2]Summarize the changes in the pull request including how it relates to any issues (include the #number, or link them).
Please make sure these boxes are checked before submitting your pull request - thanks!
./scripts/api-tests.shto make sure you didn't break anything