Skip to content

Fix nldi: validate data source on every call, not just cache-miss#246

Merged
thodson-usgs merged 6 commits intoDOI-USGS:mainfrom
thodson-usgs:fix/nldi-data-source-validation
May 4, 2026
Merged

Fix nldi: validate data source on every call, not just cache-miss#246
thodson-usgs merged 6 commits intoDOI-USGS:mainfrom
thodson-usgs:fix/nldi-data-source-validation

Conversation

@thodson-usgs
Copy link
Copy Markdown
Collaborator

@thodson-usgs thodson-usgs commented May 3, 2026

Summary

  • _validate_data_source nested its data_source not in cache check inside the if cache is None: branch, so the validator was effectively a no-op after the first call. Any invalid data_source / feature_source value passed thereafter was silently accepted and used to construct an invalid URL, surfacing as a confusing 4xx from the server rather than a clean ValueError.
  • Hoist the validation check outside the cache-load branch so every call validates.
  • Also guards the unconditional _validate_data_source(feature_source) call in get_features with if feature_source is not None: — without this guard the now-active validator would raise "Invalid data source 'None'" whenever a user provided only comid (the cache bug previously masked this).

Test plan

  • One regression test in tests/nldi_test.py that exercises the load-bearing behavior change: invalid sources must still raise after the cache is warm. Cold-cache test was dropped as redundant — the warm-cache test exercises the cold path on its first call.
  • An autouse monkeypatch fixture resets the module-level _AVAILABLE_DATA_SOURCES cache between tests to keep test order independent.
  • Full local test suite passes.

Related PRs

The validation check `data_source not in _AVAILABLE_DATA_SOURCES` was
nested inside `if _AVAILABLE_DATA_SOURCES is None:`, so once any caller
warmed the cache, subsequent invalid `data_source`/`feature_source`
values were silently accepted. Move the check outside the cache-load
branch so all calls are validated.

Also guard the `_validate_data_source(feature_source)` call in
`get_features` with `if feature_source:` — the previous code unconditionally
validated `None` when the user provided only `comid`. The cache bug
masked this; with the check now active it would otherwise raise
"Invalid data source 'None'".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes NLDI data-source validation so invalid data_source / feature_source values are rejected on every call, not just while populating the cache. It fits the NLDI client layer by restoring early, user-facing argument validation instead of letting bad inputs fall through to confusing server-side 4xx responses.

Changes:

  • Moved _validate_data_source()’s membership check out of the cache-miss branch so cached calls are still validated.
  • Added a guard in get_features() to avoid validating feature_source=None on comid-based calls.
  • Added regression tests and a test fixture to reset the module-level data-source cache between tests.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
dataretrieval/nldi.py Fixes cached validation behavior and adjusts get_features() feature-source validation flow.
tests/nldi_test.py Adds regression tests for cold-cache and warm-cache invalid data-source handling, plus cache reset fixture.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread dataretrieval/nldi.py Outdated
_validate_data_source(data_source)
# validate feature source
_validate_data_source(feature_source)
if feature_source:
thodson-usgs and others added 3 commits May 4, 2026 10:05
Per copilot review on PR DOI-USGS#246. The previous truthiness guard skipped
validation for empty-string feature_source/data_source, letting bad
inputs build invalid URLs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per /simplify review on PR DOI-USGS#246:
- Use `monkeypatch.setattr` in the autouse cache-reset fixture rather
  than direct attribute assignment + yield. Auto-restores on teardown
  and collapses the fixture to one line.
- Drop redundant cold-cache regression test
  (`test_validate_data_source_rejects_invalid_on_first_call`); the
  warm-cache test exercises the cold path on its first call anyway.
- Strip stale WHAT-comments (`# validate the data source`,
  `# validate feature source`, etc.) from the `get_features`
  validation block while we're touching adjacent lines.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@thodson-usgs thodson-usgs marked this pull request as ready for review May 4, 2026 19:57
@thodson-usgs thodson-usgs merged commit 11dd5a2 into DOI-USGS:main May 4, 2026
8 checks passed
@thodson-usgs thodson-usgs deleted the fix/nldi-data-source-validation branch May 4, 2026 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants