Skip to content

Conversation

@PaulBrack
Copy link

@PaulBrack PaulBrack commented Jan 16, 2026

Summary by CodeRabbit

  • Improvements
    • Optimized Enfield Council bin collection data retrieval for enhanced reliability and performance.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 16, 2026

📝 Walkthrough

Walkthrough

The EnfieldCouncil council parser transitions from Selenium-based web scraping with Cloudflare handling and iframe navigation to direct HTTP API requests using curl_cffi. This simplification reduces code complexity by replacing HTML parsing with JSON processing, extracting bin collection data from API responses instead of rendered web pages.

Changes

Cohort / File(s) Summary
Enfield Council Scraper Refactor
uk_bin_collection/uk_bin_collection/councils/EnfieldCouncil.py
Replaced Selenium web driver flow with curl_cffi-based HTTP requests for UPRN data retrieval. Removed postcode/paon handling, WebDriver initialization, Cloudflare/iframe navigation, and DOM parsing. Added JSON response processing with JobName/Description extraction as bin type and ScheduledStart parsing as collection date. Simplified output to {"bins": [{"type", "collectionDate"}]} format with validation and sorting by date.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • #1742: Related council parser update replacing Selenium/HTML scraping with UPRN-based API calls and standardized JSON-driven output structure

Poem

🐰 No more Selenium's dance, no WebDriver delays,
Direct API calls now light the bin-finding ways,
From HTML chaos to JSON's clean refrain,
The scraper leaps forward, with effort and gain! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Remove Selenium from Enfield' directly and clearly summarizes the main change: replacing Selenium-based web scraping with direct HTTP requests in the Enfield Council module.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@uk_bin_collection/uk_bin_collection/councils/EnfieldCouncil.py`:
- Around line 38-42: The code that extracts job_name and scheduled can raise
AttributeError if keys like "JobName", "Description", or "ScheduledStart" exist
but are None or non-dict; update the lookups to use a defensive fallback (e.g.,
replace item.get("JobName", {}) and item.get("Description", {}) and
item.get("ScheduledStart", {}) with (item.get("JobName") or {}),
(item.get("Description") or {}), and (item.get("ScheduledStart") or {})
respectively) so that job_name and scheduled assignments in EnfieldCouncil (the
variables job_name and scheduled) safely call .get("_text") without assuming
dicts.
🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/EnfieldCouncil.py (1)

58-60: Hardcoded date format may break if date_format changes.

Line 48 uses date_format from the common module for formatting, but line 59 hardcodes "%d/%m/%Y" for parsing during sort. If they diverge, sorting will raise ValueError.

♻️ Proposed fix to use the shared format constant
         bindata["bins"].sort(
-            key=lambda x: datetime.strptime(x["collectionDate"], "%d/%m/%Y")
+            key=lambda x: datetime.strptime(x["collectionDate"], date_format)
         )
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 274152f and 89fe3ce.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/EnfieldCouncil.py
🧰 Additional context used
🪛 Ruff (0.14.11)
uk_bin_collection/uk_bin_collection/councils/EnfieldCouncil.py

4-4: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names

(F403)


10-10: Unused method argument: page

(ARG002)


12-12: check_uprn may be undefined, or defined from star imports

(F405)


31-31: Avoid specifying long messages outside the exception class

(TRY003)


48-48: date_format may be undefined, or defined from star imports

(F405)


49-49: contains_date may be undefined, or defined from star imports

(F405)

🔇 Additional comments (3)
uk_bin_collection/uk_bin_collection/councils/EnfieldCouncil.py (3)

1-5: Import and architecture looks good for this codebase.

The transition from Selenium to curl_cffi is a solid simplification. The star import from common is a pre-existing pattern across council implementations, and the unused page parameter is inherited from AbstractGetBinDataClass.


14-25: HTTP request configuration looks appropriate.

Good use of timeout and browser impersonation headers. The uprn is validated by check_uprn before use, and UPRNs are numeric, so direct string interpolation is safe here.


27-33: Good defensive check for Cloudflare blocking.

The HTML detection before JSON parsing prevents cryptic parse errors if the API returns a Cloudflare challenge page instead of data.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +38 to +42
job_name = (
item.get("JobName", {}).get("_text")
or item.get("Description", {}).get("_text")
)
find_address_button.click()
scheduled = item.get("ScheduledStart", {}).get("_text")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential AttributeError if API returns null or non-dict values.

item.get("JobName", {}) only returns {} when the key is absent. If JobName exists but is null (Python None) or a string, calling .get("_text") on it will raise AttributeError.

🐛 Proposed fix using defensive fallback
         for item in data:
             job_name = (
-                item.get("JobName", {}).get("_text")
-                or item.get("Description", {}).get("_text")
+                (item.get("JobName") or {}).get("_text")
+                or (item.get("Description") or {}).get("_text")
             )
-            scheduled = item.get("ScheduledStart", {}).get("_text")
+            scheduled = (item.get("ScheduledStart") or {}).get("_text")

Using (item.get("X") or {}) ensures falsy values like None also fall back to an empty dict.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
job_name = (
item.get("JobName", {}).get("_text")
or item.get("Description", {}).get("_text")
)
find_address_button.click()
scheduled = item.get("ScheduledStart", {}).get("_text")
job_name = (
(item.get("JobName") or {}).get("_text")
or (item.get("Description") or {}).get("_text")
)
scheduled = (item.get("ScheduledStart") or {}).get("_text")
🤖 Prompt for AI Agents
In `@uk_bin_collection/uk_bin_collection/councils/EnfieldCouncil.py` around lines
38 - 42, The code that extracts job_name and scheduled can raise AttributeError
if keys like "JobName", "Description", or "ScheduledStart" exist but are None or
non-dict; update the lookups to use a defensive fallback (e.g., replace
item.get("JobName", {}) and item.get("Description", {}) and
item.get("ScheduledStart", {}) with (item.get("JobName") or {}),
(item.get("Description") or {}), and (item.get("ScheduledStart") or {})
respectively) so that job_name and scheduled assignments in EnfieldCouncil (the
variables job_name and scheduled) safely call .get("_text") without assuming
dicts.

@PaulBrack PaulBrack marked this pull request as draft January 16, 2026 14:37
@codecov
Copy link

codecov bot commented Jan 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (274152f) to head (89fe3ce).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1815   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant