Status: Work in progress. Public API, error types, and on-disk format coverage may change. Not yet production-ready.
Current implementation: reads PostgreSQL heap files directly from disk (no shared buffer pool yet). A buffer-pool / page-cache layer is on the roadmap.
PostgreSQL version: only tested against PostgreSQL 18. Older versions may work, but multi-version testing is WIP.
Low-level library for reading PostgreSQL data files directly and converting them to Apache Arrow format. Used by pgfusion as the page-parsing and Arrow conversion layer.
- Rust — rustup.rs
- just — command runner for all recipes
# macOS
brew install just
# Linux / Windows (via cargo)
cargo install just
# All platforms (pre-built binary)
curl --proto '=https' --tlsv1.2 -sSf https://just.systems/install.sh | bash -s -- --to ~/.local/binFor flamegraph and profiling recipes:
cargo install cargo-flamegraph # flamegraph-* recipes
cargo install samply # samply-* recipes# Build
cargo build
# Run the table_reader example
just example-table-reader /path/to/pgdata # defaults to db "postgres"
just example-table-reader /path/to/pgdata pgbench_test
# Run tests
just testjust build # Debug build
just release # Release build
just test # Unit tests
just bench # Criterion benchmarks
just bench-iai # iai instruction-count benchmarks
just bench-io # File I/O latency benchmarks
just flamegraph-bench # Flamegraph for criterion bench
just flamegraph-example /path/to/pgdata # Flamegraph for table_reader example
just doc # Open rustdoc
just --list # Show all available recipesPostgreSQL setup uses the pg-test-harness scripts. Clone the pg-arrow/utils repo and point PG_HARNESS_DIR at the harness subdirectory:
git clone https://github.com/pg-arrow/utils /path/to/utils
export PG_HARNESS_DIR=/path/to/utils/pg-test-harnessAdd the export to your shell profile (~/.zshrc, ~/.bashrc) to persist it.
Setup writes testdata/ and pg-test-config.toml under $PG_HARNESS_DIR. The just recipes inherit that env var — no extra flags needed:
# Full setup: build from source, init cluster, load test data
just pg-setup pg18 # or pg17 / latest
# Full setup with simple schema (no pgbench tables)
just pg-setup-simple pg18
# Individual steps
just pg-build pg18 # Build PostgreSQL source only
just pg-init pg18 # Init cluster (source must be built)
just pg-testdata pg18 # Load test data into initialised clusterOr invoke the harness script directly:
bash "$PG_HARNESS_DIR/scripts/setup-postgres.sh" -b pg18 -B -i -t| Flag | Description |
|---|---|
-b, --branch VERSION |
pg18, pg17, pg16, latest, or full branch name |
-B, --build |
Build PostgreSQL locally (meson/ninja) |
-i, --init |
Initialize database cluster |
-t, --test-data |
Create test database with sample data |
-s, --simple-schema |
Single-table schema instead of full e-commerce schema |
-p, --pgbench |
Create a pgbench_test database with pgbench data |
- Clones PostgreSQL from https://git.postgresql.org/git/postgresql.git into
$PG_HARNESS_DIR/testdata/postgres/ - Creates a git worktree under
$PG_HARNESS_DIR/testdata/postgres-{version}/ - Optionally builds PostgreSQL locally (installs to
testdata/postgres-{version}/install/) - Optionally initializes the database cluster (no root/postgres user needed)
- Optionally creates a test database and loads schema + sample data
- Writes paths to
$PG_HARNESS_DIR/pg-test-config.tomlfor use in Rust tests
$PG_HARNESS_DIR/ # = utils/pg-test-harness in this repo
├── pg-test-config.toml # one config, shared by pg_arrow + pgfusion
├── testdata/
│ ├── postgres/ # main PostgreSQL git repository
│ ├── postgres-latest/ # worktree for master branch
│ │ ├── data/
│ │ ├── build/
│ │ └── install/bin/
│ └── postgres-pg18/
│ ├── data/
│ ├── build/
│ └── install/bin/
└── scripts/
├── setup-postgres.sh # PostgreSQL build/init/test-data setup
└── pgbackrest-backup.sh # WAL archiving and backup management
Generated by the setup script. Paths inside are relative to $PG_HARNESS_DIR (the config file's parent directory) — read programmatically in tests via pg_test_harness::read_pg_config(); never hardcode paths.
[postgres.pg18]
version = "REL_18_STABLE"
source_dir = "testdata/postgres-pg18"
data_dir = "testdata/postgres-pg18/data"
bin_dir = "testdata/postgres-pg18/install/bin"
initialized = true
test_db_created = trueSimple schema (-s flag): single test_types table covering all common PostgreSQL datatypes — ideal for basic parsing tests.
Full e-commerce schema (default): 5 tables (categories, products, customers, orders, order_items) with foreign keys, multiple index types, and ~20 rows of sample data.
just backup-setup # Configure WAL archiving
just backup-full # Full backup
just backup-incr # Incremental backup
just backup-info # Show backup info
just backup-restore /path # Restore to directory