feat: rewrite north west leicestershire as pure http scraper by InertiaUK · Pull Request #2075 · robbrad/UKBinCollectionData

InertiaUK · 2026-05-13T15:35:56Z

Summary

Rewrites the scraper from Selenium to pure HTTP requests — no Chrome or webdriver needed
The previous scraper timed out because it passed raw OS UPRNs to the Cuttlefish CMS, which uses its own internal nwl-prefixed address IDs
Uses the council's address autocomplete endpoint (/data/ac/addresses.json) to resolve postcode + house number to an internal ID, then sets a session cookie via /location and parses the homepage HTML
Input is now postcode + house number only (no UPRN needed)

The existing Selenium scraper wasn't fundamentally broken — the timeout was caused by the ID mismatch, not a site change. This rewrite fixes the root cause and removes the Selenium dependency.

Testing

Postcode + house number: DE74 2FZ + paon 1
Tested via API end-to-end — 21 collection entries returned (Refuse, Garden Waste, Red Box, Blue Bag, Yellow Bag)

Summary by CodeRabbit

Refactor

NorthWestLeicestershire Council Integration – Updated address lookup method to accept postcode and house number instead of UPRN, eliminating browser automation dependency for improved query performance and simplicity.

Removes the Selenium dependency entirely. The council's Cuttlefish CMS has an address autocomplete endpoint that returns internal IDs, and a cookie-based location system that serves collection dates as server- rendered HTML. Three plain HTTP requests replace the previous flow of launching Chrome, waiting for elements, and clicking links. Postcode + house number is now the only input needed (no UPRN).

coderabbitai · 2026-05-13T15:36:12Z

Warning

Review limit reached

@InertiaUK, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 2 reviews/hour. Refill in 27 minutes and 17 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c72032e7-b059-403a-a31f-ec75fd226766

📥 Commits

Reviewing files that changed from the base of the PR and between ab0cd60 and 36a93c3.

📒 Files selected for processing (2)

uk_bin_collection/tests/input.json
uk_bin_collection/uk_bin_collection/councils/NorthWestLeicestershire.py

📝 Walkthrough

Walkthrough

Migrated NorthWestLeicestershire council scraper from Selenium/UPRN automation to HTTP-based postcode and house-number lookup. Removed Selenium imports, implemented address autocomplete resolution with error handling, added HTML parsing for refuse collection dates with relative-date normalization, and updated test configuration to match the new interface.

Changes

NorthWestLeicestershire HTTP Migration

Layer / File(s)	Summary
Test Configuration Update `uk_bin_collection/tests/input.json`	Replaced UPRN and web_driver with postcode (`DE74 2FZ`) and house_number (`1`), updated URL to council home endpoint, enabled skip_get_url, and clarified wiki_note to indicate no UPRN or Selenium required.
HTTP-Based Scraper Implementation `uk_bin_collection/uk_bin_collection/councils/NorthWestLeicestershire.py`	Removed Selenium imports; added class constants for autocomplete, location, and home URLs; replaced parse_data to resolve addresses via JSON autocomplete, fetch location HTML, parse refuse items with date normalization; added _resolve_address helper to disambiguate multiple matches and raise ValueError on lookup failures.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

dp247

Poem

🐰 From Selenium clicks to requests so pure,
The scraper now glides with HTTP sure—
Postcodes and houses dance in the query,
HTML parsed quick, no driver's worry.
North West Leicestershire fetches with grace! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: replacing a Selenium-based scraper with a pure HTTP implementation for the North West Leicestershire council.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-13T15:38:30Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (8ecf878) to head (36a93c3).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #2075   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/tests/input.json`:
- Around line 1827-1829: Remove the duplicate "postcode" key from the
NorthWestLeicestershire fixture in the JSON object (the object that currently
contains "house_number": "1" and two "postcode" entries); keep a single
"postcode" entry (the correct value "DE74 2FZ") so the object is unambiguous and
valid JSON.

In `@uk_bin_collection/uk_bin_collection/councils/NorthWestLeicestershire.py`:
- Around line 70-77: The parser currently calls datetime.strptime(date_str, "%a
%d %b") which uses year 1900 (not a leap year) and will fail for "29 Feb";
instead parse against a leap-safe placeholder year (e.g., 2000) by
appending/replacing the year in date_str or using datetime.strptime with a
format that includes a fixed year, then replace the placeholder year with
current_year/current_year+1 when projecting onto the actual collection year;
update the logic around parsed_date, current_date and current_year so
parsed_date = parsed_date.replace(year=current_year) and parsed_date =
parsed_date.replace(year=current_year + 1) work after parsing with the safe year
(ensure date_str, parsed_date, current_date and current_year are the referenced
symbols).
- Around line 37-44: The requests to LOCATION_URL and HOME_URL in the
NorthWestLeicestershire scraper lack timeouts and don’t validate HTTP status, so
update the session.get calls (the one with params={"put": nwl_id,...} and the
one assigning response = session.get(self.HOME_URL)) to include a reasonable
timeout (e.g. timeout=10) and immediately check the response via
response.raise_for_status() (or validate response.status_code) to fail fast on
network/HTTP errors; ensure any exceptions are allowed to propagate or are
converted into a clear upstream error rather than falling back to “No refuse
collection data found.”
- Around line 24-28: Read the "house_number" kwarg and prefer it when resolving
ambiguous addresses: retrieve user_house_number = kwargs.get("house_number")
(keep existing user_paon = kwargs.get("paon") and
check_postcode(user_postcode)), then call self._resolve_address(user_postcode,
user_house_number or user_paon) so callers using the documented house_number
field disambiguate autocomplete hits; update the variables around the existing
call to _resolve_address accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d88b6ac1-8700-411d-9d4c-840279b47587

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and ab0cd60.

📒 Files selected for processing (2)

uk_bin_collection/tests/input.json
uk_bin_collection/uk_bin_collection/councils/NorthWestLeicestershire.py

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

fix: address CodeRabbit review feedback

36a93c3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: rewrite north west leicestershire as pure http scraper#2075

feat: rewrite north west leicestershire as pure http scraper#2075
InertiaUK wants to merge 2 commits into
robbrad:masterfrom
InertiaUK:feat/nwleics-http-rewrite

InertiaUK commented May 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 13, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 13, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

InertiaUK commented May 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Summary by CodeRabbit

Refactor

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

InertiaUK commented May 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 13, 2026 •

edited

Loading

codecov Bot commented May 13, 2026 •

edited

Loading