-
Notifications
You must be signed in to change notification settings - Fork 195
Remove Selenium from Enfield #1815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThe EnfieldCouncil council parser transitions from Selenium-based web scraping with Cloudflare handling and iframe navigation to direct HTTP API requests using curl_cffi. This simplification reduces code complexity by replacing HTML parsing with JSON processing, extracting bin collection data from API responses instead of rendered web pages. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@uk_bin_collection/uk_bin_collection/councils/EnfieldCouncil.py`:
- Around line 38-42: The code that extracts job_name and scheduled can raise
AttributeError if keys like "JobName", "Description", or "ScheduledStart" exist
but are None or non-dict; update the lookups to use a defensive fallback (e.g.,
replace item.get("JobName", {}) and item.get("Description", {}) and
item.get("ScheduledStart", {}) with (item.get("JobName") or {}),
(item.get("Description") or {}), and (item.get("ScheduledStart") or {})
respectively) so that job_name and scheduled assignments in EnfieldCouncil (the
variables job_name and scheduled) safely call .get("_text") without assuming
dicts.
🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/EnfieldCouncil.py (1)
58-60: Hardcoded date format may break ifdate_formatchanges.Line 48 uses
date_formatfrom the common module for formatting, but line 59 hardcodes"%d/%m/%Y"for parsing during sort. If they diverge, sorting will raiseValueError.♻️ Proposed fix to use the shared format constant
bindata["bins"].sort( - key=lambda x: datetime.strptime(x["collectionDate"], "%d/%m/%Y") + key=lambda x: datetime.strptime(x["collectionDate"], date_format) )
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/EnfieldCouncil.py
🧰 Additional context used
🪛 Ruff (0.14.11)
uk_bin_collection/uk_bin_collection/councils/EnfieldCouncil.py
4-4: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names
(F403)
10-10: Unused method argument: page
(ARG002)
12-12: check_uprn may be undefined, or defined from star imports
(F405)
31-31: Avoid specifying long messages outside the exception class
(TRY003)
48-48: date_format may be undefined, or defined from star imports
(F405)
49-49: contains_date may be undefined, or defined from star imports
(F405)
🔇 Additional comments (3)
uk_bin_collection/uk_bin_collection/councils/EnfieldCouncil.py (3)
1-5: Import and architecture looks good for this codebase.The transition from Selenium to
curl_cffiis a solid simplification. The star import fromcommonis a pre-existing pattern across council implementations, and the unusedpageparameter is inherited fromAbstractGetBinDataClass.
14-25: HTTP request configuration looks appropriate.Good use of timeout and browser impersonation headers. The
uprnis validated bycheck_uprnbefore use, and UPRNs are numeric, so direct string interpolation is safe here.
27-33: Good defensive check for Cloudflare blocking.The HTML detection before JSON parsing prevents cryptic parse errors if the API returns a Cloudflare challenge page instead of data.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| job_name = ( | ||
| item.get("JobName", {}).get("_text") | ||
| or item.get("Description", {}).get("_text") | ||
| ) | ||
| find_address_button.click() | ||
| scheduled = item.get("ScheduledStart", {}).get("_text") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential AttributeError if API returns null or non-dict values.
item.get("JobName", {}) only returns {} when the key is absent. If JobName exists but is null (Python None) or a string, calling .get("_text") on it will raise AttributeError.
🐛 Proposed fix using defensive fallback
for item in data:
job_name = (
- item.get("JobName", {}).get("_text")
- or item.get("Description", {}).get("_text")
+ (item.get("JobName") or {}).get("_text")
+ or (item.get("Description") or {}).get("_text")
)
- scheduled = item.get("ScheduledStart", {}).get("_text")
+ scheduled = (item.get("ScheduledStart") or {}).get("_text")Using (item.get("X") or {}) ensures falsy values like None also fall back to an empty dict.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| job_name = ( | |
| item.get("JobName", {}).get("_text") | |
| or item.get("Description", {}).get("_text") | |
| ) | |
| find_address_button.click() | |
| scheduled = item.get("ScheduledStart", {}).get("_text") | |
| job_name = ( | |
| (item.get("JobName") or {}).get("_text") | |
| or (item.get("Description") or {}).get("_text") | |
| ) | |
| scheduled = (item.get("ScheduledStart") or {}).get("_text") |
🤖 Prompt for AI Agents
In `@uk_bin_collection/uk_bin_collection/councils/EnfieldCouncil.py` around lines
38 - 42, The code that extracts job_name and scheduled can raise AttributeError
if keys like "JobName", "Description", or "ScheduledStart" exist but are None or
non-dict; update the lookups to use a defensive fallback (e.g., replace
item.get("JobName", {}) and item.get("Description", {}) and
item.get("ScheduledStart", {}) with (item.get("JobName") or {}),
(item.get("Description") or {}), and (item.get("ScheduledStart") or {})
respectively) so that job_name and scheduled assignments in EnfieldCouncil (the
variables job_name and scheduled) safely call .get("_text") without assuming
dicts.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1815 +/- ##
=======================================
Coverage 86.67% 86.67%
=======================================
Files 9 9
Lines 1141 1141
=======================================
Hits 989 989
Misses 152 152 ☔ View full report in Codecov by Sentry. |
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.