Skip to content

waterdata: tighten stats + OGC pagination (geometry, KeyError, non-200)#255

Draft
thodson-usgs wants to merge 2 commits intoDOI-USGS:mainfrom
thodson-usgs:fix/stats-pagination
Draft

waterdata: tighten stats + OGC pagination (geometry, KeyError, non-200)#255
thodson-usgs wants to merge 2 commits intoDOI-USGS:mainfrom
thodson-usgs:fix/stats-pagination

Conversation

@thodson-usgs
Copy link
Copy Markdown
Collaborator

Summary

Three correctness fixes to the two pagination loops in `dataretrieval.waterdata.utils`.

  • `get_stats_data` dropped geometry on continuation pages. Page 1 honored `GEOPANDAS` but pages 2..N hard-coded `geopd=False`. With geopandas installed, a multi-page stats response started as a `GeoDataFrame` and pages 2..N came back as plain `DataFrame`s; `pd.concat` then silently downgraded the result and the caller lost geometry / CRS. Use `geopd=GEOPANDAS` on every page.
  • `get_stats_data` raised `KeyError` on responses without a `next` key. Some terminal responses simply omit it. Switched to `body.get("next")`, which produces `None` and exits the loop cleanly.
  • Both pagination loops swallowed non-200 continuation pages silently. A 4xx/5xx page whose body happened to JSON-decode could be appended as data and pagination would quietly stop — the caller got a partial result with no warning. Added an explicit `if status_code != 200` raise inside each loop so the existing log-and-truncate handler fires deliberately rather than incidentally.

Test plan

  • Three new regression tests in `tests/waterdata_utils_test.py`: `_walk_pages` raises and stops on a 500-status continuation; `get_stats_data` accepts a response without a `next` key; `get_stats_data` truncates cleanly on a 503 continuation.
  • Existing tests in the file still pass.
  • Full local test suite passes (229 tests, with `API_USGS_PAT` set).

thodson-usgs and others added 2 commits May 3, 2026 15:23
…-200

Three correctness fixes to the two pagination loops in
dataretrieval.waterdata.utils.

* `get_stats_data` honored `GEOPANDAS` for the first page but
  hard-coded `geopd=False` on every continuation page. With geopandas
  installed, a multi-page stats response started as a `GeoDataFrame`
  and pages 2..N came back as plain `DataFrame`s; `pd.concat` then
  silently downgraded the result and the caller lost geometry / CRS.
  Use `geopd=GEOPANDAS` on every page.

* `get_stats_data` indexed `body["next"]` directly, raising `KeyError`
  on responses without that key (some terminal responses simply omit
  it). Switch to `body.get("next")`, which produces `None` and exits
  the loop cleanly.

* Both `get_stats_data`'s in-loop request and `_walk_pages`'s in-loop
  request returned the response without checking `status_code`. A 4xx
  or 5xx page whose body happened to JSON-decode could be appended as
  data, then pagination quietly stopped — the caller got a partial
  result with no warning. Add an explicit `if status_code != 200`
  raise inside each loop so the existing log-and-truncate handler
  fires deliberately rather than incidentally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both pagination loops now had four call sites repeating
``if resp.status_code != 200: raise RuntimeError(_error_body(resp))``.
Move that into a one-line helper alongside ``_error_body`` and call
it from every site.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant