pg_arrow

Status: Work in progress. Public API, error types, and on-disk format coverage may change. Not yet production-ready.

Current implementation: reads PostgreSQL heap files directly from disk (no shared buffer pool yet). A buffer-pool / page-cache layer is on the roadmap.

PostgreSQL version: only tested against PostgreSQL 18. Older versions may work, but multi-version testing is WIP.

Low-level library for reading PostgreSQL data files directly and converting them to Apache Arrow format. Used by pgfusion as the page-parsing and Arrow conversion layer.

Prerequisites

Rust — rustup.rs
just — command runner for all recipes

# macOS
brew install just

# Linux / Windows (via cargo)
cargo install just

# All platforms (pre-built binary)
curl --proto '=https' --tlsv1.2 -sSf https://just.systems/install.sh | bash -s -- --to ~/.local/bin

For flamegraph and profiling recipes:

cargo install cargo-flamegraph  # flamegraph-* recipes
cargo install samply            # samply-* recipes

Quick start

# Build
cargo build

# Run the table_reader example
just example-table-reader /path/to/pgdata           # defaults to db "postgres"
just example-table-reader /path/to/pgdata pgbench_test

# Run tests
just test

Common commands

just build                    # Debug build
just release                  # Release build
just test                     # Unit tests
just bench                    # Criterion benchmarks
just bench-iai                # iai instruction-count benchmarks
just bench-io                 # File I/O latency benchmarks
just flamegraph-bench         # Flamegraph for criterion bench
just flamegraph-example /path/to/pgdata  # Flamegraph for table_reader example
just doc                      # Open rustdoc
just --list                   # Show all available recipes

PostgreSQL Setup for Testing

Prerequisites

PostgreSQL setup uses the pg-test-harness scripts. Clone the pg-arrow/utils repo and point PG_HARNESS_DIR at the harness subdirectory:

git clone https://github.com/pg-arrow/utils /path/to/utils
export PG_HARNESS_DIR=/path/to/utils/pg-test-harness

Add the export to your shell profile (~/.zshrc, ~/.bashrc) to persist it.

Quick Setup

Setup writes testdata/ and pg-test-config.toml under $PG_HARNESS_DIR. The just recipes inherit that env var — no extra flags needed:

# Full setup: build from source, init cluster, load test data
just pg-setup pg18            # or pg17 / latest

# Full setup with simple schema (no pgbench tables)
just pg-setup-simple pg18

# Individual steps
just pg-build pg18            # Build PostgreSQL source only
just pg-init pg18             # Init cluster (source must be built)
just pg-testdata pg18         # Load test data into initialised cluster

Or invoke the harness script directly:

bash "$PG_HARNESS_DIR/scripts/setup-postgres.sh" -b pg18 -B -i -t

Script options

Flag	Description
`-b, --branch VERSION`	`pg18`, `pg17`, `pg16`, `latest`, or full branch name
`-B, --build`	Build PostgreSQL locally (meson/ninja)
`-i, --init`	Initialize database cluster
`-t, --test-data`	Create test database with sample data
`-s, --simple-schema`	Single-table schema instead of full e-commerce schema
`-p, --pgbench`	Create a `pgbench_test` database with pgbench data

What the script does

Clones PostgreSQL from https://git.postgresql.org/git/postgresql.git into $PG_HARNESS_DIR/testdata/postgres/
Creates a git worktree under $PG_HARNESS_DIR/testdata/postgres-{version}/
Optionally builds PostgreSQL locally (installs to testdata/postgres-{version}/install/)
Optionally initializes the database cluster (no root/postgres user needed)
Optionally creates a test database and loads schema + sample data
Writes paths to $PG_HARNESS_DIR/pg-test-config.toml for use in Rust tests

Directory structure after setup

$PG_HARNESS_DIR/                  # = utils/pg-test-harness in this repo
├── pg-test-config.toml           # one config, shared by pg_arrow + pgfusion
├── testdata/
│   ├── postgres/                 # main PostgreSQL git repository
│   ├── postgres-latest/          # worktree for master branch
│   │   ├── data/
│   │   ├── build/
│   │   └── install/bin/
│   └── postgres-pg18/
│       ├── data/
│       ├── build/
│       └── install/bin/
└── scripts/
    ├── setup-postgres.sh         # PostgreSQL build/init/test-data setup
    └── pgbackrest-backup.sh      # WAL archiving and backup management

`pg-test-config.toml` format

Generated by the setup script. Paths inside are relative to $PG_HARNESS_DIR (the config file's parent directory) — read programmatically in tests via pg_test_harness::read_pg_config(); never hardcode paths.

[postgres.pg18]
version = "REL_18_STABLE"
source_dir = "testdata/postgres-pg18"
data_dir = "testdata/postgres-pg18/data"
bin_dir = "testdata/postgres-pg18/install/bin"
initialized = true
test_db_created = true

Test database schemas

Simple schema (-s flag): single test_types table covering all common PostgreSQL datatypes — ideal for basic parsing tests.

Full e-commerce schema (default): 5 tables (categories, products, customers, orders, order_items) with foreign keys, multiple index types, and ~20 rows of sample data.

pgbackrest

just backup-setup             # Configure WAL archiving
just backup-full              # Full backup
just backup-incr              # Incremental backup
just backup-info              # Show backup info
just backup-restore /path     # Restore to directory

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
RESEARCH		RESEARCH
benches		benches
docs		docs
examples		examples
src		src
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
justfile		justfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pg_arrow

Prerequisites

Quick start

Common commands

PostgreSQL Setup for Testing

Prerequisites

Quick Setup

Script options

What the script does

Directory structure after setup

`pg-test-config.toml` format

Test database schemas

pgbackrest

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pg_arrow

Prerequisites

Quick start

Common commands

PostgreSQL Setup for Testing

Prerequisites

Quick Setup

Script options

What the script does

Directory structure after setup

pg-test-config.toml format

Test database schemas

pgbackrest

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`pg-test-config.toml` format

Packages