Fix 4DN index router reading wrong field path in file document — Closes #26#27
Draft
conradbzura wants to merge 3 commits intomasterfrom
Draft
Fix 4DN index router reading wrong field path in file document — Closes #26#27conradbzura wants to merge 3 commits intomasterfrom
conradbzura wants to merge 3 commits intomasterfrom
Conversation
The /index/{dcc}/{local_id} endpoint read sidecar extras from
extra.extra_files, but the 4DN scraper writes them to
extra.fourdn.extra_files. As a result every 4DN index request
returned 404 "No index file available" even when a .bai, .tbi,
.beddb, .bw, or .pairs_px2 sidecar was present in the database.
Dispatch the lookup by DCC and read the namespaced subdocument.
Non-4DN DCCs return no extras for now; ENCODE and HuBMAP branches
will be added in follow-up issues once their sidecar ingestion is
in place.
Add unit tests for stream_index_file in the index router covering the behavior surface of the public endpoint: 4DN sidecar lookup for both HEAD and GET, case-insensitive DCC dispatch, the legacy top-level extras path being ignored, non-4DN DCCs always returning no-index until their branches land, empty and absent extras, entries missing href, unknown DCC rejection, missing file document, and Range handling including valid, malformed, unsatisfiable, and missing-file-size cases.
Lift the wait_for_cutover no-op into an autouse class fixture and switch mocker.patch calls to mocker.patch.object using imports of cfdb.services.locks and cfdb.services.drs, per the Python test guide preference for AST-indexable, rename-safe patch targets. Drop test_stream_index_file_4dn_without_fourdn_extras: its equivalence class (extras resolve to empty) is covered by the empty-list boundary test, and the distinct legacy-shape regression is covered by test_stream_index_file_4dn_with_only_legacy_top_level_extras.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Route the
/index/{dcc}/{local_id}sidecar lookup through the DCC-namespacedextrasubdocument so 4DN index files can actually be served. Before this change the router readextra.extra_files, but the 4DN scraper writes extras underextra.fourdn.extra_files, so every/index/4dn/<local_id>request returned 404 even when a.bai,.tbi,.beddb,.bw, or.pairs_px2sidecar was present. Dispatch the lookup by DCC and read the namespaced path. Non-4DN DCCs (ENCODE, HuBMAP) return no extras for now — their sidecar ingestion is not yet in place and will be handled in follow-up issues. Closes #26.Proposed changes
src/cfdb/api/routers/index.pyReplace the DCC-agnostic
extra.get("extra_files")lookup with a DCC-dispatched read. Fornormalized_dcc == "4dn", readextra.fourdn.extra_files(matching the path written bysrc/cfdb/services/fourdn.pyand materialized bysrc/cfdb/services/sync.py). For every other DCC, return an empty list so the existing 404 "No index file available" path fires — preserving today's behavior for ENCODE and HuBMAP until their sidecar ingestion lands. Update the handler docstring to describe the DCC-namespaced layout.tests/test_index.pyAdd 13 unit tests for
stream_index_file, organized underTestStreamIndexFile. Coverage includes:extra.extra_filesis ignoredTests use the
mock_dbFakeDBfixture fromtests/conftest.py(which monkeypatchescfdb.api.dbwith an in-memory database), so no MongoDB instance is required. An autouse class fixture stubslocks.wait_for_cutoverto a no-op for every test.Test cases
TestStreamIndexFileextra.fourdn.extra_files/index/4dn/<local_id>Content-Disposition,Accept-Ranges, andContent-LengthheadersTestStreamIndexFileextra.extra_filesTestStreamIndexFilefile_sizesetRange: bytes=0-99Content-RangeheaderTestStreamIndexFileTestStreamIndexFileTestStreamIndexFilehrefkeyTestStreamIndexFile"4DN"(mixed case)TestStreamIndexFileStreamingResponseis returned with status 200 and correct headersTestStreamIndexFilefile_sizerecordedRangeheaderTestStreamIndexFileTestStreamIndexFilefile_size=1024Range: bytes=2000-3000Content-Range: bytes */1024TestStreamIndexFileTestStreamIndexFileextra.fourdn.extra_fileslist