Clean up tutorial notebook output noise and version drift#170
Clean up tutorial notebook output noise and version drift#170dimitri-yatsenko wants to merge 3 commits into
Conversation
Tutorial notebooks committed with executed outputs were leaking three
forms of build-environment noise into the rendered docs:
- TqdmWarning ("IProgress not found, please update ipywidgets")
because the EXECUTE/EXECUTE_PG image had no ipywidgets installed,
so tqdm.auto fell back to text mode and emitted the warning.
- scikit-image "Downloading file mitosis.tif from gitlab.com ..."
chatter because the dataset wasn't cached in the kernel image.
- Stale DataJoint connection banners (2.1.1) in 21 notebooks
because outputs hadn't been refreshed since the last DJ release.
Changes:
- docker-compose.yaml: add ipywidgets to both EXECUTE branches.
- scripts/execute_notebooks.py: warm the scikit-image cache before
nbconvert spawns kernels.
- scripts/check_notebook_versions.py: new guard that compares each
notebook's "DataJoint X.Y.Z connected" banner against
extra.datajoint_version in mkdocs.yaml and fails on drift.
- README.md: document the committed-outputs policy and the guard.
Refreshed .ipynb outputs will land in a follow-up PR.
All 23 tutorial and how-to notebooks have been re-executed against the current released DataJoint version (2.2.2). The committed cell outputs were stale: 21 notebooks still showed "DataJoint 2.1.1 connected" in their connection banner, and blob-detection additionally rendered a spurious TqdmWarning and a scikit-image dataset download line. Built on top of #170 (which added ipywidgets to the executor image and pre-cached the scikit-image datasets), so the regenerated outputs are free of the previous noise: - No "TqdmWarning: IProgress not found" stderr blocks. - No "Downloading file 'data/mitosis.tif' from gitlab.com ..." stdout. - All connection banners read "DataJoint 2.2.2 connected to ...". scripts/check_notebook_versions.py now exits 0.
MilagrosMarin
left a comment
There was a problem hiding this comment.
Thanks @dimitri-yatsenko! Verified the three forms of noise end-to-end:
✅ Reproduced TqdmWarning: IProgress not found in src/tutorials/examples/blob-detection.ipynb cell 6 — same cell also has the scikit-image Downloading file 'data/mitosis.tif' from gitlab.com... chatter.
✅ python scripts/check_notebook_versions.py runs from a clean checkout (no PyYAML / Material-tag headaches), flags 21 stale notebooks on DataJoint 2.1.1 (target: 2.2.x), exits 1. Matches the PR description exactly.
✅ Pre-cache loaders match the skimage calls in blob-detection.ipynb cell 6 — both data.hubble_deep_field() and data.human_mitosis() are used there, and blob-detection is the only notebook with the download chatter.
✅ ipywidgets correctly scoped to EXECUTE / EXECUTE_PG only — LIVE/BUILD render committed outputs via mkdocs-jupyter and don't need it.
✅ Scope is well-isolated: infrastructure here, notebook refresh as a follow-up PR.
A few small things worth thinking about — none blocking:
1. The guard script isn't wired into CI. check_notebook_versions.py is a manual step right now, so the next time extra.datajoint_version bumps, stale banners can land unflagged again. Consider adding it to .github/workflows/development.yml so the check fails the PR rather than relying on a contributor remembering to run it. (Could be a follow-up — flagging it so we don't lose track.)
2. Pre-cache failure is swallowed silently. In execute_notebooks.py:
except Exception as _e:
print(f" pre-cache warn: {_loader.__name__}: {_e}")If gitlab.com is down or the dataset URL changes, pre-cache fails quietly and the "Downloading file ..." chatter returns in the next refresh — exactly the noise this PR is trying to prevent. A non-zero exit (or at minimum a print(..., file=sys.stderr) so it's visible in CI logs) would catch that case. Minor.
3. skimage imported at module-execute time inside main(). Fine inside docker (always pip-installed), but python scripts/execute_notebooks.py --help would now error out if scikit-image isn't installed locally. Wrapping the import in a try/except and skipping the pre-cache block when unavailable would make the script friendlier to local invocations. Probably overengineering — your call.
4. had_banner short-circuits after the first stale match. If a notebook somehow has multiple banners (e.g., a cell re-running dj.conn()), only the first stale one is reported. Unlikely in practice — leaving as a note.
Otherwise this is clean — happy to approve once you've decided on the CI question.
Refresh tutorial notebook outputs against DataJoint 2.2.2
Summary
Tutorial and how-to notebooks are committed with their executed cell outputs, but the kernel image used to execute them was leaking three forms of noise into the rendered docs:
TqdmWarning: IProgress not found— visible at/tutorials/examples/blob-detection/.tqdm.autofalls back to text mode because the execution image had noipywidgets, so the warning got serialized into the notebook output and is rendered on the live site.Downloading file 'data/mitosis.tif' from gitlab.com ...) — appears in the same cell because the dataset isn't pre-cached.DataJoint 2.1.1 connectedbanners in 21 notebooks —mkdocs.yamldeclaresextra.datajoint_version: "2.2"and current release is2.2.2. The committed banners are over a year behind.What's in this PR
docker-compose.yaml— addipywidgetsto the pip-install line in bothMODE=EXECUTEandMODE=EXECUTE_PGbranches. This silences theTqdmWarningat the source (letstqdm.autoresolve totqdm.notebook).scripts/execute_notebooks.py— warm the scikit-image cache (hubble_deep_field,human_mitosis) beforenbconvertspawns its kernel, so the one-time download message doesn't get captured into any future re-executed output.scripts/check_notebook_versions.py(new) — scans every committed notebook'sDataJoint X.Y.Z connectedbanner and fails if the major.minor doesn't matchextra.datajoint_versioninmkdocs.yaml. Currently flags all 21 stale notebooks.README.md— short "Notebook execution policy" section documenting that outputs are intentionally committed and how to refresh them.The notebook output refresh itself (re-executing all 21 notebooks against DataJoint 2.2.2) lands in a separate follow-up PR so the dependency/infrastructure change here stays reviewable in isolation.
Test plan
python scripts/check_notebook_versions.pyruns from a checkout (no PyYAML dependency on Material's custom YAML tags) — currently reports the 21 stale notebooks and exits 1.MODE=EXECUTE_PG docker compose up --buildbrings the stack up; the docs container logsPre-caching scikit-image datasets...andcached: hubble_deep_field/cached: human_mitosisbefore any[N/21]notebook execution starts.grep -rln 'TqdmWarning\|IProgress not found' src/returns empty.grep -rln 'Downloading file.*scikit-image' src/returns empty.python scripts/check_notebook_versions.pyexits 0.