Skip to content

Fix 4DN index router reading wrong field path in file document — Closes #26#27

Draft
conradbzura wants to merge 3 commits intomasterfrom
26-fix-4dn-index-router-extras-path
Draft

Fix 4DN index router reading wrong field path in file document — Closes #26#27
conradbzura wants to merge 3 commits intomasterfrom
26-fix-4dn-index-router-extras-path

Conversation

@conradbzura
Copy link
Copy Markdown
Collaborator

@conradbzura conradbzura commented Apr 14, 2026

Summary

Route the /index/{dcc}/{local_id} sidecar lookup through the DCC-namespaced extra subdocument so 4DN index files can actually be served. Before this change the router read extra.extra_files, but the 4DN scraper writes extras under extra.fourdn.extra_files, so every /index/4dn/<local_id> request returned 404 even when a .bai, .tbi, .beddb, .bw, or .pairs_px2 sidecar was present. Dispatch the lookup by DCC and read the namespaced path. Non-4DN DCCs (ENCODE, HuBMAP) return no extras for now — their sidecar ingestion is not yet in place and will be handled in follow-up issues. Closes #26.

Proposed changes

src/cfdb/api/routers/index.py

Replace the DCC-agnostic extra.get("extra_files") lookup with a DCC-dispatched read. For normalized_dcc == "4dn", read extra.fourdn.extra_files (matching the path written by src/cfdb/services/fourdn.py and materialized by src/cfdb/services/sync.py). For every other DCC, return an empty list so the existing 404 "No index file available" path fires — preserving today's behavior for ENCODE and HuBMAP until their sidecar ingestion lands. Update the handler docstring to describe the DCC-namespaced layout.

tests/test_index.py

Add 13 unit tests for stream_index_file, organized under TestStreamIndexFile. Coverage includes:

  • 4DN sidecar lookup for HEAD and GET
  • Case-insensitive DCC dispatch
  • Regression guard: legacy top-level extra.extra_files is ignored
  • Non-4DN DCCs always return no-index
  • Empty extras list and malformed entries
  • Missing-href entry, unknown DCC rejection, missing file document
  • Range handling: valid, malformed, unsatisfiable, missing file size

Tests use the mock_db FakeDB fixture from tests/conftest.py (which monkeypatches cfdb.api.db with an in-memory database), so no MongoDB instance is required. An autouse class fixture stubs locks.wait_for_cutover to a no-op for every test.

Test cases

# Test Suite Given When Then Coverage Target
1 TestStreamIndexFile A 4DN file document with a sidecar entry under extra.fourdn.extra_files A HEAD request is issued for /index/4dn/<local_id> Status is 200 with Content-Disposition, Accept-Ranges, and Content-Length headers 4DN sidecar HEAD happy path
2 TestStreamIndexFile A 4DN file document whose extras live only at the legacy top-level extra.extra_files A HEAD request is issued Status is 404 — legacy path is ignored Regression guard for the pre-fix field path
3 TestStreamIndexFile A 4DN file document with a sidecar and file_size set A HEAD request is issued with Range: bytes=0-99 Status is 206 with a correct Content-Range header Range request handling
4 TestStreamIndexFile No matching file document in the database A HEAD request is issued Status is 404 with detail "File not found" Missing file document
5 TestStreamIndexFile An unrecognized DCC name A HEAD request is issued Status is 400 with a detail listing valid DCCs Unknown DCC rejection
6 TestStreamIndexFile A 4DN sidecar entry missing the href key A HEAD request is issued Status is 404 with detail "Index file has no download URL" Malformed sidecar entry
7 TestStreamIndexFile A 4DN file document with a sidecar A HEAD request is issued with DCC "4DN" (mixed case) Status is 200 — dispatch is case-insensitive DCC case normalization
8 TestStreamIndexFile A 4DN file document with a sidecar A GET request is issued A StreamingResponse is returned with status 200 and correct headers GET dispatch path
9 TestStreamIndexFile A 4DN sidecar entry with no file_size recorded A HEAD request is issued with a Range header Status is 500 with detail about unavailable file size Range without known size
10 TestStreamIndexFile A 4DN sidecar entry A HEAD request is issued with a malformed Range header Status is 400 with detail "Invalid Range header" Malformed Range header
11 TestStreamIndexFile A 4DN sidecar entry with file_size=1024 A HEAD request is issued with Range: bytes=2000-3000 Status is 416 with Content-Range: bytes */1024 Unsatisfiable Range
12 TestStreamIndexFile A HuBMAP file document with arbitrary extras A HEAD request is issued Status is 404 — non-4DN DCCs return no index until their branches land Non-4DN DCC dispatch
13 TestStreamIndexFile A 4DN file document with an empty extra.fourdn.extra_files list A HEAD request is issued Status is 404 with detail "No index file available for this file" Empty extras list

The /index/{dcc}/{local_id} endpoint read sidecar extras from
extra.extra_files, but the 4DN scraper writes them to
extra.fourdn.extra_files. As a result every 4DN index request
returned 404 "No index file available" even when a .bai, .tbi,
.beddb, .bw, or .pairs_px2 sidecar was present in the database.

Dispatch the lookup by DCC and read the namespaced subdocument.
Non-4DN DCCs return no extras for now; ENCODE and HuBMAP branches
will be added in follow-up issues once their sidecar ingestion is
in place.
Add unit tests for stream_index_file in the index router covering
the behavior surface of the public endpoint: 4DN sidecar lookup for
both HEAD and GET, case-insensitive DCC dispatch, the legacy
top-level extras path being ignored, non-4DN DCCs always returning
no-index until their branches land, empty and absent extras,
entries missing href, unknown DCC rejection, missing file document,
and Range handling including valid, malformed, unsatisfiable, and
missing-file-size cases.
@conradbzura conradbzura self-assigned this Apr 14, 2026
Lift the wait_for_cutover no-op into an autouse class fixture and
switch mocker.patch calls to mocker.patch.object using imports of
cfdb.services.locks and cfdb.services.drs, per the Python test
guide preference for AST-indexable, rename-safe patch targets.

Drop test_stream_index_file_4dn_without_fourdn_extras: its
equivalence class (extras resolve to empty) is covered by the
empty-list boundary test, and the distinct legacy-shape regression
is covered by test_stream_index_file_4dn_with_only_legacy_top_level_extras.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix 4DN index router reading wrong field path in file document

1 participant