From 0cfb1c52a630066c7c20ef7ea9de6c3b5af1713e Mon Sep 17 00:00:00 2001 From: Ian Czekala Date: Thu, 21 May 2026 14:09:08 +1000 Subject: [PATCH 1/7] added notes about what we're doing. --- 00-download-and-extract-datasets/README.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 00-download-and-extract-datasets/README.md diff --git a/00-download-and-extract-datasets/README.md b/00-download-and-extract-datasets/README.md new file mode 100644 index 0000000..df3d3fa --- /dev/null +++ b/00-download-and-extract-datasets/README.md @@ -0,0 +1,14 @@ +# Download and extract datasets + +Contend with the fact that `casatools` frequently lags supported SPEC 0 Python versions, creating a situation where one needs a Python version old enough to run `casatools` but new enough to support modern development. These requirements are often mutually exclusive and require multiple Python versions as part of the same analysis. + +We've tried to decouple these somewhat, by isolating the `casatools` dependency to this specific example folder. + +# Installation and running + +As of 21 May 2026, the latest version supported by `casatools` is Python 3.10. + +You can run to download individual datasets, or all of them at once. + + +For subsequent examples, you will be asked to copy some of the data products into those folders. \ No newline at end of file From 694b8d76be1f49d781c86c9913315712ff23c8fa Mon Sep 17 00:00:00 2001 From: Ian Czekala Date: Thu, 21 May 2026 14:12:35 +1000 Subject: [PATCH 2/7] update example description. --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index c491754..330f2f5 100644 --- a/README.md +++ b/README.md @@ -24,7 +24,7 @@ and you should see all scripts execute in order. # Table of Contents -* [01 - **Intro**: Setup Mock Image and Baselines](01-generate-mock-baselines/README.md) | Generate a mock sky image $I_\nu(l,m)$ and interferometer baselines $(u,v)$ (but not visibilities $\mathcal{V}(u,v)$). These products are used as input for the other examples. +* [01 - **Intro**: Setup Mock Image and Baselines](01-generate-mock-baselines/README.md) | Generate a mock sky image $I_\nu(l,m)$ and interferometer baselines $(u,v)$ (but not the visibilities). These products are used as input for the other examples. * [02 - **Intro**: Stochastic Gradient Descent](02-sgd/README.md) | A complete end-to-end example using MPoL to image mock data. * [03 - **Advanced**: Visibility Inference with Pyro](03-AS209-pyro-inference/README.md) | Use MPoL with Pyro to sample parametric visibility plane models. * [04 - **Advanced**: IM Lup protoplanetary disk](04-IMLup-multi-EB) | Use MPoL to image the ALMA DSHARP observations of the IM Lup protoplanetary disk, taking into account alignment and weight-scaling adjustments for a multi-execution block dataset. From d152946dca83aaef4495172b2021630c214609cc Mon Sep 17 00:00:00 2001 From: Ian Czekala Date: Thu, 21 May 2026 20:04:23 +1000 Subject: [PATCH 3/7] updated top README with new organizational strategy. --- README.md | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index 330f2f5..a384eca 100644 --- a/README.md +++ b/README.md @@ -1,16 +1,18 @@ # MPoL Examples -This repository hosts self-contained examples demonstrating [MPoL](https://mpol-dev.github.io/MPoL/) functionality. More info on each example can be found in the README.md within each example folder. +This repository hosts self-contained examples demonstrating [MPoL](https://mpol-dev.github.io/MPoL/) functionality. More info on each example can be found in the README.md within each example folder. This repository is *not* continuously integrated with the rest of the codebase because the computational demands are too significant. If you do encounter an error, please log it as a [GitHub issue](https://github.com/MPoL-dev/examples/issues). -This repository is *not* continuously integrated with the rest of the codebase, because the computational demands are too significant. If you do encounter an error, please log it as a [GitHub issue](https://github.com/MPoL-dev/examples/issues). +# Installing and running the examples -# Installing and Running the Examples +These examples strive to use real, or approximately real, data to demonstrate MPoL functionality. Unfortunately, using real ALMA data means that we must contend with the fact that the Python `casatools` package frequently lags supported [SPEC 0](https://scientific-python.org/specs/spec-0000/) Python versions. This creates a situation where one needs both a deprecated Python version old enough to run `casatools` and a current Python version new enough to support modern development. -Each example assumes you have activated a (virtual) python environment to which you've [successfully installed the MPoL package](https://mpol-dev.github.io/MPoL/installation.html), and that your version of python is at least the minimum current version supported by MPoL. +## The initial 00 example +We've isolated the `casatools` dependency to the initial dataset download and extraction in the `00-download-and-extract-datasets` example. In that folder, you will need to install `casatools` into an older Python version that still supports it (e.g., 3.10). ([More Info](./00-download-and-extract-datasets/README.md)) -To run an example, `cd` into that subfolder. +## All other examples +Once you have completed the 00 example and extracted the ALMA datasets to either `.npz` or `.asdf` formats, you can then copy these data products into the `01-...` or `02-...` example folders that require them. In those later example folders, it is assumed that you have activated a (virtual) python environment to which you've [successfully installed the MPoL package](https://mpol-dev.github.io/MPoL/installation.html), and that your version of python is at least the minimum current version supported by MPoL. -The `requirements.txt` file lists the additional python packages necessary for the analysis specific to the example in that folder. You can install them with +Each folder will have a `requirements.txt` file that lists the additional python packages necessary for the analysis specific to the example in that folder. You can install them with ``` pip install -r requirements.txt ``` @@ -22,10 +24,9 @@ $ snakemake -c 1 all ``` and you should see all scripts execute in order. - -# Table of Contents +# List of Examples +* [00 - **Setup**: Download and Extract Datasets](00-download-and-extract-datasets/README.md) | Download a few calibrated ALMA measurement sets and use `casatools` and `visread` to extract the visibilities to a common format like `.npz` or `.asdf`. * [01 - **Intro**: Setup Mock Image and Baselines](01-generate-mock-baselines/README.md) | Generate a mock sky image $I_\nu(l,m)$ and interferometer baselines $(u,v)$ (but not the visibilities). These products are used as input for the other examples. * [02 - **Intro**: Stochastic Gradient Descent](02-sgd/README.md) | A complete end-to-end example using MPoL to image mock data. * [03 - **Advanced**: Visibility Inference with Pyro](03-AS209-pyro-inference/README.md) | Use MPoL with Pyro to sample parametric visibility plane models. -* [04 - **Advanced**: IM Lup protoplanetary disk](04-IMLup-multi-EB) | Use MPoL to image the ALMA DSHARP observations of the IM Lup protoplanetary disk, taking into account alignment and weight-scaling adjustments for a multi-execution block dataset. - +* [04 - **Advanced**: IM Lup protoplanetary disk](04-IMLup-multi-EB) | Use MPoL to image the ALMA DSHARP observations of the IM Lup protoplanetary disk, taking into account alignment and weight-scaling adjustments for a multi-execution block dataset. \ No newline at end of file From 48a4188660bcdf513bd717551d3d1ea98fcc26bf Mon Sep 17 00:00:00 2001 From: Ian Czekala Date: Thu, 21 May 2026 20:17:31 +1000 Subject: [PATCH 4/7] Starting to assemble download directory. --- 00-download-and-extract-datasets/Snakefile | 23 +++++++++++++++++++ .../export_baselines.py | 0 .../requirements.txt | 5 ++++ 3 files changed, 28 insertions(+) create mode 100644 00-download-and-extract-datasets/Snakefile rename {01-generate-mock-baselines => 00-download-and-extract-datasets}/export_baselines.py (100%) create mode 100644 00-download-and-extract-datasets/requirements.txt diff --git a/00-download-and-extract-datasets/Snakefile b/00-download-and-extract-datasets/Snakefile new file mode 100644 index 0000000..be21940 --- /dev/null +++ b/00-download-and-extract-datasets/Snakefile @@ -0,0 +1,23 @@ +rule all: + input: + "data/IM_Lup.ms", + +rule download_ms: + output: temp("data/IMLup_continuum.ms.tgz") + shell: "wget https://almascience.eso.org/almadata/lp/DSHARP/MSfiles/IMLup_continuum.ms.tgz --directory-prefix=data/" + +rule untar_and_rename: + input: "data/IMLup_continuum.ms.tgz" + output: directory("data/IM_Lup.ms") + shell: + "tar -xf {input} -C data/ --no-same-owner && " + "mv data/IMLup_continuum.ms {output}" + +# only uu and vv that have *all channels unflagged* +# only 1 channel is taken from those available: [1, 8, 16] +# 0.05 taken randomly from those +# saved using float32 +rule export_baselines: + input: "data/IM_Lup.ms" + output: npz="data/IM_Lup_baselines_and_weights.npz", plot="data/IM_Lup_baseline_plot.png" + shell: "python export_baselines.py {input} {output.npz} {output.plot} --select_fraction 0.05" diff --git a/01-generate-mock-baselines/export_baselines.py b/00-download-and-extract-datasets/export_baselines.py similarity index 100% rename from 01-generate-mock-baselines/export_baselines.py rename to 00-download-and-extract-datasets/export_baselines.py diff --git a/00-download-and-extract-datasets/requirements.txt b/00-download-and-extract-datasets/requirements.txt new file mode 100644 index 0000000..b120689 --- /dev/null +++ b/00-download-and-extract-datasets/requirements.txt @@ -0,0 +1,5 @@ +casatools +casadata +numpy +snakemake +visread[casa] \ No newline at end of file From 86eef52a333803ffa6fd4a9a8030794f432aec98 Mon Sep 17 00:00:00 2001 From: Ian Czekala Date: Fri, 22 May 2026 21:11:16 +1000 Subject: [PATCH 5/7] Added Dockerfile commands to build casatools container, locally and via GitHub Actions. --- .github/workflows/docker-publish.yml | 67 +++++++++++++++++++ 00-download-and-extract-datasets/README.md | 16 +++-- 00-download-and-extract-datasets/run.sh | 6 ++ casatools-env/Dockerfile | 44 ++++++++++++ casatools-env/README.md | 16 +++++ casatools-env/build_locally.sh | 7 ++ casatools-env/casasiteconfig.py | 18 +++++ .../requirements.txt | 0 8 files changed, 167 insertions(+), 7 deletions(-) create mode 100644 .github/workflows/docker-publish.yml create mode 100755 00-download-and-extract-datasets/run.sh create mode 100644 casatools-env/Dockerfile create mode 100644 casatools-env/README.md create mode 100755 casatools-env/build_locally.sh create mode 100644 casatools-env/casasiteconfig.py rename {00-download-and-extract-datasets => casatools-env}/requirements.txt (100%) diff --git a/.github/workflows/docker-publish.yml b/.github/workflows/docker-publish.yml new file mode 100644 index 0000000..3456168 --- /dev/null +++ b/.github/workflows/docker-publish.yml @@ -0,0 +1,67 @@ +name: Build and Push Docker Image + +on: + push: + branches: [ "main" ] + tags: [ 'v*.*.*' ] + pull_request: + branches: [ "main" ] + +env: + REGISTRY: ghcr.io + # Dynamically set the image name to lowercase 'ghcr.io/MPoL-dev/examples' + IMAGE_NAME: ${{ github.repository }} + +jobs: + build-and-push: + runs-on: ubuntu-latest + permissions: + contents: read + packages: write + + steps: + - name: Checkout repository + uses: actions/checkout@v4 + + # Sets up QEMU for multi-platform support (e.g., amd64, arm64) + - name: Set up QEMU + uses: docker/setup-qemu-action@v3 + + # Sets up Docker Buildx (required for multi-arch builds and advanced caching) + - name: Set up Docker Buildx + uses: docker/setup-buildx-action@v3 + + # Log in to GitHub Container Registry + - name: Log in to the Container Registry + if: github.event_name != 'pull_request' + uses: docker/login-action@v3 + with: + registry: ${{ env.REGISTRY }} + username: ${{ github.actor }} + password: ${{ secrets.GITHUB_TOKEN }} + + # Extract metadata (tags, labels) for Docker + - name: Extract Docker metadata + id: meta + uses: docker/metadata-action@v5 + with: + images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }} + tags: | + type=ref,event=branch + type=semver,pattern={{version}} + type=sha,format=short + + # Build and push Docker image + - name: Build and push Docker image + uses: docker/build-push-action@v6 + with: + context: ./casatools-env + file: ./Dockerfile + push: ${{ github.event_name != 'pull_request' }} + tags: ${{ id.meta.outputs.tags }} + labels: ${{ id.meta.outputs.labels }} + # Specifying platforms ensures optimization across target architectures + platforms: linux/amd64,linux/arm64 + # Utilizing GitHub Actions native caching dramatically speeds up subsequent builds + cache-from: type=gha + cache-to: type=gha,mode=max \ No newline at end of file diff --git a/00-download-and-extract-datasets/README.md b/00-download-and-extract-datasets/README.md index df3d3fa..c9a3930 100644 --- a/00-download-and-extract-datasets/README.md +++ b/00-download-and-extract-datasets/README.md @@ -1,14 +1,16 @@ # Download and extract datasets -Contend with the fact that `casatools` frequently lags supported SPEC 0 Python versions, creating a situation where one needs a Python version old enough to run `casatools` but new enough to support modern development. These requirements are often mutually exclusive and require multiple Python versions as part of the same analysis. - -We've tried to decouple these somewhat, by isolating the `casatools` dependency to this specific example folder. - # Installation and running -As of 21 May 2026, the latest version supported by `casatools` is Python 3.10. +In this 'example,' one merely downloads and extracts the ALMA datasets to a common data format like `.npz` or `.asdf`. This step requires the `casatools` package, which frequently has restrictions on Python versions and installation environments. + +To simplify this step for users of the tutorials, one can use our Docker container via +``` +./run.sh +``` -You can run to download individual datasets, or all of them at once. +Depending on the speed of your internet connection, it may take some time to download the several Gb measurement sets from the archive servers. +Upon successful completion, you should see the following items in your directory: -For subsequent examples, you will be asked to copy some of the data products into those folders. \ No newline at end of file +Alternatively, if one already has their own Python environment compatible casatools, one can install the [relevant packages](../casatools-env/requirements.txt) into that environment. \ No newline at end of file diff --git a/00-download-and-extract-datasets/run.sh b/00-download-and-extract-datasets/run.sh new file mode 100755 index 0000000..80bdc4a --- /dev/null +++ b/00-download-and-extract-datasets/run.sh @@ -0,0 +1,6 @@ +#!/usr/bin/env bash + +docker run --rm --platform linux/amd64 \ + -v "$(pwd)":/workspace \ + casatools-env \ + snakemake -c1 all \ No newline at end of file diff --git a/casatools-env/Dockerfile b/casatools-env/Dockerfile new file mode 100644 index 0000000..7ebdd90 --- /dev/null +++ b/casatools-env/Dockerfile @@ -0,0 +1,44 @@ +# casatools-env/Dockerfile +# need to force the platform here so that casatools is available for the right +# architecture. Doesn't work with Apple Silicon. +FROM --platform=linux/amd64 ubuntu:24.04 + +ENV DEBIAN_FRONTEND=noninteractive + +# Install core system dependencies +RUN apt-get update && apt-get install -y \ + python3.12 \ + python3.12-dev \ + python3.12-venv \ + python3-pip-whl \ + gfortran \ + libgomp1 \ + wget \ + && rm -rf /var/lib/apt/lists/* + +WORKDIR /workspace + +RUN python3.12 -m venv /opt/casa_venv + +# Add the virtual environment binaries to the container path +# This automatically "activates" the venv +ENV PATH="/opt/casa_venv/bin:$PATH" + +# Upgrade pip inside the virtual environment +RUN pip install --no-cache-dir --upgrade pip setuptools wheel + +# Copy and install requirements +COPY requirements.txt /workspace/requirements.txt +RUN pip install --no-cache-dir -r requirements.txt + +# create casa data directory +# https://casadocs.readthedocs.io/en/stable/notebooks/external-data.html +RUN mkdir -p /root/.casa/data + +COPY casasiteconfig.py /workspace/casasiteconfig.py +ENV CASASITECONFIG="/workspace/casasiteconfig.py" + +# Trigger the initial measures data download inside the container build, +# so that this isn't redone each time the user runs the container +# https://casadocs.readthedocs.io/en/stable/api/casaconfig.html#casasiteconfig-py +RUN python3 -c "from casaconfig import pull_data; pull_data('/root/.casa/data')" \ No newline at end of file diff --git a/casatools-env/README.md b/casatools-env/README.md new file mode 100644 index 0000000..e80e9ab --- /dev/null +++ b/casatools-env/README.md @@ -0,0 +1,16 @@ +# Docker image with casatools + +This folder contains the instructions and Dockerfile for how to build a Docker container that will run `casatools`, necessary for the `00-download-and-extract-datasets` example. In the normal course of events, this container is built automatically as part of the GitHub Actions workflow, and you can download it from the GitHub Container Registry following the instructions in the [00 example](../00-download-and-extract-datasets/README.md). + +The following instructions are aimed at developers of the examples. It is assumed that you have already installed a [Docker environment](https://www.docker.com/), e.g., such as Docker Desktop. + +Building the container locally, from within this directory, and tag as `casatools-env` +``` +docker build --platform linux/amd64 -t casatools-env . +``` +Note that the `--platform linux/amd64` is necessary to accommodate the modular casa packages, which to my understanding are not built for Apple Silicon architecture. + +Run the container and enter a bash shell +``` +docker run --rm -it --platform linux/amd64 casatools-env +``` \ No newline at end of file diff --git a/casatools-env/build_locally.sh b/casatools-env/build_locally.sh new file mode 100755 index 0000000..4c7fae3 --- /dev/null +++ b/casatools-env/build_locally.sh @@ -0,0 +1,7 @@ +#!/usr/bin/env bash + +# note: it is not required to run this script to build the container, +# it should already be built by GitHub workflows and available via the +# GitHub Container Registry +# However, if you want to build the container locally, then use this +docker build --platform linux/amd64 -t casatools-env . \ No newline at end of file diff --git a/casatools-env/casasiteconfig.py b/casatools-env/casasiteconfig.py new file mode 100644 index 0000000..b6bb945 --- /dev/null +++ b/casatools-env/casasiteconfig.py @@ -0,0 +1,18 @@ +# https://casadocs.readthedocs.io/en/stable/api/casaconfig.html#casasiteconfig-py +# An example site config file. +# Place this in a location checked by casaconfig: +# /opt/casa/casasiteconfig.py +# /home/casa/casasiteconfig.py +# the environment value CASASITECONFIG - use the fully qualified path +# anywhere in the python path, e.g. the site-packages directory in the CASA being used + +# This file should be edited to set measurespath as appropriate + +# Set this to point to the location where the site maintained casarundata can be found +# by default datapath will include measurespath + +measurespath = "/root/.casa/data" + +# turn off all auto updates of data +measures_auto_update = False +data_auto_update = False \ No newline at end of file diff --git a/00-download-and-extract-datasets/requirements.txt b/casatools-env/requirements.txt similarity index 100% rename from 00-download-and-extract-datasets/requirements.txt rename to casatools-env/requirements.txt From 5466abe50d6a618c6858b49335919126f49df2f2 Mon Sep 17 00:00:00 2001 From: Ian Czekala Date: Sat, 23 May 2026 10:26:40 +1000 Subject: [PATCH 6/7] modify GHA to use steps variable. --- .github/workflows/docker-publish.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/docker-publish.yml b/.github/workflows/docker-publish.yml index 3456168..7df2625 100644 --- a/.github/workflows/docker-publish.yml +++ b/.github/workflows/docker-publish.yml @@ -58,8 +58,8 @@ jobs: context: ./casatools-env file: ./Dockerfile push: ${{ github.event_name != 'pull_request' }} - tags: ${{ id.meta.outputs.tags }} - labels: ${{ id.meta.outputs.labels }} + tags: ${{ steps.meta.outputs.tags }} + labels: ${{ steps.meta.outputs.labels }} # Specifying platforms ensures optimization across target architectures platforms: linux/amd64,linux/arm64 # Utilizing GitHub Actions native caching dramatically speeds up subsequent builds From ce8ebcfb30604806ed2fa65c4cf7f8947aa21255 Mon Sep 17 00:00:00 2001 From: Ian Czekala Date: Sat, 23 May 2026 10:40:34 +1000 Subject: [PATCH 7/7] changing Dockerfile path. --- .github/workflows/docker-publish.yml | 2 +- 00-download-and-extract-datasets/Snakefile | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/.github/workflows/docker-publish.yml b/.github/workflows/docker-publish.yml index 7df2625..023bb1e 100644 --- a/.github/workflows/docker-publish.yml +++ b/.github/workflows/docker-publish.yml @@ -56,7 +56,7 @@ jobs: uses: docker/build-push-action@v6 with: context: ./casatools-env - file: ./Dockerfile + file: ./casatools-env/Dockerfile push: ${{ github.event_name != 'pull_request' }} tags: ${{ steps.meta.outputs.tags }} labels: ${{ steps.meta.outputs.labels }} diff --git a/00-download-and-extract-datasets/Snakefile b/00-download-and-extract-datasets/Snakefile index be21940..a65f219 100644 --- a/00-download-and-extract-datasets/Snakefile +++ b/00-download-and-extract-datasets/Snakefile @@ -1,6 +1,7 @@ rule all: input: - "data/IM_Lup.ms", + "data/IM_Lup_baselines_and_weights.npz", + "data/IM_Lup_baseline_plot.png" rule download_ms: output: temp("data/IMLup_continuum.ms.tgz")