Navformer

Navformer is the end-to-end model training and evaluation component of WorldEngine, built on MMDetection3D, the nuPlan / OpenScene dataset and NAVSIM.

It supports a full training loop: train → open-loop evaluation → rare case extraction → RL fine-tuning, with VADv2 and HydraMDP as the supported model architectures.

System Requirements

Minimum:

GPU: NVIDIA GPU with 8 GB VRAM (e.g., RTX 2080)
RAM: 32 GB
Storage: 500 GB SSD
CPU: 8 cores

Recommended:

GPU: NVIDIA GPU with 24 GB+ VRAM (e.g., RTX 3090, A100)
RAM: 64 GB+
Storage: 5 TB+ SSD
CPU: 16+ cores

Software:

OS: Linux (Ubuntu 20.04 / 22.04)
CUDA: 11.8
Conda / Miniconda

Installation

1. Create Conda Environment

conda create --name navformer python=3.9 -y
conda activate navformer

2. Install PyTorch

pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 \
    --index-url https://download.pytorch.org/whl/cu118

Verify:

python -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda.is_available()}')"
# Expected: PyTorch: 2.0.1+cu118, CUDA: True

3. Install MMCV (build from source)

MMCV must be built from source to include custom CUDA operators:

git clone https://github.com/open-mmlab/mmcv.git
cd mmcv
git checkout v1.6.2

# Build with custom ops (takes 10–15 minutes)
# Downgrade setuptools to ~75.1.0 if you encounter build errors
MMCV_WITH_OPS=1 pip install -v -e .

python .dev_scripts/check_installation.py
cd ..

Verify:

python -c "import mmcv; print(f'MMCV: {mmcv.__version__}')"
# Expected: MMCV: 1.6.2

4. Install OpenMMLab Ecosystem

pip install mmcls==0.25.0
pip install mmdet==2.25.3
pip install mmdet3d==1.0.0rc6
pip install mmsegmentation==0.29.1

5. Install Navformer Dependencies

pip install -r requirements.txt
pip install shapely==2.0.4

6. Verify Installation

python -c "
import torch, mmcv, mmdet, mmdet3d, numpy, hydra
print('All Navformer dependencies OK')
print(f'PyTorch {torch.__version__}')
print(f'MMCV {mmcv.__version__}')
print(f'MMDetection3D {mmdet3d.__version__}')
print(f'CUDA available: {torch.cuda.is_available()}')
"

Environment Variables

Navformer relies on the NAVSIM devkit v1.1:

git clone -b v1.1 https://github.com/autonomousvision/navsim.git

Add the following to ~/.bashrc or ~/.zshrc:

export NAVSIM_DEVKIT_ROOT="/path/to/navsim"
export NAVFORMER_ROOT="/path/to/Navformer"
export NUPLAN_MAPS_ROOT="/path/to/nuplan/maps"

PYTHONPATH=$NAVFORMER_ROOT:$NAVSIM_DEVKIT_ROOT:$PYTHONPATH

Apply:

source ~/.bashrc   # or source ~/.zshrc

Data

Directory Layout

Navformer/
├── data/
│   ├── raw/                       # nuPlan and OpenScene datasets
│   └── alg_engine/                # Navformer-specific data
└── experiments/                   # Experiment outputs (auto-created)

Download

Navformer reuses the OpenDriveLab/WorldEngine dataset on Hugging Face, which contains merged annotation PKLs, PDM caches, model checkpoints, and K-means vocab files.

Hugging Face:

curl -LsSf https://hf.co/cli/install.sh | bash
hf download OpenDriveLab/WorldEngine --repo-type dataset --local-dir /path/to/Navformer

ModelScope (recommended for users in China):

pip install modelscope
modelscope download --dataset OpenDriveLab/WorldEngine

Raw Data (`data/raw/`)

data/raw/
├── nuplan/
│   └── dataset/
│       ├── maps/                  # HD maps (required)
│       │   ├── nuplan-maps-v1.0.json
│       │   ├── us-nv-las-vegas-strip/
│       │   ├── us-ma-boston/
│       │   ├── us-pa-pittsburgh-hazelwood/
│       │   └── sg-one-north/
│       └── nuplan-v1.1/
│           ├── sensor_blobs/      # Camera images and LiDAR
│           └── splits/
│
└── openscene-v1.1/
    ├── sensor_blobs/
    │   ├── trainval/
    │   └── test/
    └── meta_datas/
        ├── trainval/
        └── test/

Use symlinks to point at your existing downloads:

cd data/raw
ln -s /path/to/nuplan nuplan
ln -s /path/to/openscene-v1.1 openscene-v1.1

Navformer Data (`data/alg_engine/`)

data/alg_engine/
├── ckpts/                         # Pre-trained model checkpoints
├── merged_infos_navformer/
│   ├── nuplan_openscene_navtrain.pkl
│   └── nuplan_openscene_navtest.pkl
├── pdms_cache/                    # Pre-computed PDM metrics cache
│   ├── pdm_8192_gt_cache_navtrain.pkl
│   └── pdm_8192_gt_cache_navtest.pkl
└── test_8192_kmeans.npy           # K-means clustering for PDM vocab

Quick Reference

conda activate navformer

# Training (8 GPUs)
./scripts/e2e_dist_train.sh <config> <num_gpus> [resume_checkpoint]

# Open-loop navtest evaluation
./scripts/e2e_dist_eval.sh <config> <checkpoint> <num_gpus>

# Full train set evaluation
bash scripts/e2e_dist_eval_navtrain.sh <config> <checkpoint> <num_gpus>

# Rare case extraction
python scripts/rare_case_sampling_by_pdms.py \
    --pdm-result <csv_file> \
    --base-split <yaml_file> \
    --output-dir <output_dir>

Training

Training from Scratch

conda activate navformer

# Train VADv2 (8 GPUs)
./scripts/e2e_dist_train.sh configs/navformer/e2e_vadv2.py 8

Arguments:

<config> — configuration file path
<num_gpus> — number of GPUs
[resume_checkpoint] (optional) — checkpoint to resume from

Resume Training

./scripts/e2e_dist_train.sh \
    configs/navformer/e2e_vadv2.py \
    8 \
    experiments/navformer/e2e_vadv2/latest.pth

If latest.pth exists in experiments/navformer/e2e_vadv2/, training auto-resumes when you omit the third argument.

Monitor Training

# Watch training log
tail -f experiments/navformer/e2e_vadv2/logs/train.*

# TensorBoard
tensorboard --logdir experiments/navformer/e2e_vadv2/tf_logs

Key metrics:

loss — total training loss (should decrease)
loss_planning — planning loss
loss_track — tracking loss
ade_4s — average displacement error at 4 s
fde_4s — final displacement error at 4 s

Training Output

experiments/navformer/e2e_vadv2/
├── e2e_vadv2.py          # config backup
├── logs/
│   └── train.*
├── epoch_1.pth
├── ...
├── epoch_20.pth
└── latest.pth            # symlink to latest checkpoint

Evaluation

Open-Loop Evaluation

Full Test Set

conda activate navformer

./scripts/e2e_dist_eval.sh \
    configs/navformer/e2e_vadv2.py \
    experiments/navformer/e2e_vadv2/epoch_20.pth \
    8

Output: experiments/navformer/e2e_vadv2/navtest.csv

Rare Navtest Cases Only

./scripts/e2e_dist_eval_navtest_failures.sh \
    configs/navformer/e2e_vadv2.py \
    experiments/navformer/e2e_vadv2/epoch_20.pth \
    8

Output: experiments/navformer/e2e_vadv2/navtest_failures.csv

Full Train Set

Required before Rare Case Extraction. Evaluates on the full navtrain split:

bash scripts/e2e_dist_eval_navtrain.sh \
    configs/navformer/e2e_vadv2.py \
    experiments/navformer/e2e_vadv2/epoch_20.pth \
    8

Output: experiments/navformer/e2e_vadv2/navtrain.csv

Evaluation Metrics

token,ade_4s,fde_4s,no_at_fault_collisions,drivable_area_compliance,ego_progress,comfort,score

Metric	Description	Direction
`ade_4s`	Average trajectory error over 4 s (m)	lower
`fde_4s`	Final position error at 4 s (m)	lower
`no_at_fault_collisions`	Collision avoidance rate (0–1)	higher
`drivable_area_compliance`	Stay in drivable area (0–1)	higher
`ego_progress`	Route completion (0–1)	higher
`comfort`	Comfort metric (0–1)	higher
`score`	Overall PDM score (0–1)	higher

Rare Case Extraction

Extract failure scenarios from training-set evaluation for targeted fine-tuning.

Prerequisite: complete a Full Train Set Evaluation first.

Basic Extraction

conda activate navformer

python scripts/rare_case_sampling_by_pdms.py \
    --pdm-result experiments/navformer/e2e_vadv2/navtrain.csv \
    --base-split configs/navsim_splits/navtrain_split/navtrain_50pct.yaml \
    --output-dir configs/navsim_splits/navtrain_split/e2e_vadv2_rare

Output:

configs/navsim_splits/navtrain_split/e2e_vadv2_rare/
├── navtrain_50pct_collision.yaml    # collision scenarios
├── navtrain_50pct_off_road.yaml     # off-road scenarios
└── navtrain_50pct_ep_1pct.yaml      # low ego-progress (bottom 1%)

Custom Thresholds

Edit scripts/rare_case_sampling_by_pdms.py:

# Change collision threshold
collision_scenarios = df[df['no_at_fault_collisions'] < 0.95]  # default 1.0

# Change ego-progress percentile
ep_threshold = df['ego_progress'].quantile(0.05)  # default 0.01 (1% → 5%)

Configuration

Configs follow the MMDetection3D hierarchical pattern:

configs/
├── _base_/
│   └── default_runtime.py
├── navformer/
│   ├── e2e_vadv2.py
│   ├── e2e_hydramdp.py
│   └── track_map_nuplan_r50_navtrain.py
└── navsim_splits/
    ├── navtrain_split/
    │   ├── navtrain.yaml
    │   ├── navtrain_50pct.yaml
    │   └── e2e_vadv2_rare/
    │       ├── navtrain_50pct_collision.yaml
    │       ├── navtrain_50pct_off_road.yaml
    │       └── navtrain_50pct_ep_1pct.yaml
    └── navtest_split/
        ├── navtest.yaml
        └── navtest_failures.yaml

Key Config Parameters

model = dict(
    type='VADv2',           # or 'HydraMDP'
    num_query=900,
    planning_steps=8,
)

bev_h_, bev_w_ = 200, 200
patch_size = [102.4, 102.4]  # physical range in meters

input_modality = dict(
    use_lidar=False,
    use_camera=True,         # 8 cameras
    use_radar=False,
    use_external=True,       # CAN bus
)

total_epochs = 20
optimizer = dict(type='AdamW', lr=2e-4, weight_decay=0.01)

data = dict(
    samples_per_gpu=1,
    workers_per_gpu=4,
    train=dict(
        ann_file='merged_infos_navformer/nuplan_openscene_navtrain.pkl',
        scenario_filter='configs/navsim_splits/navtrain_split/navtrain_50pct.yaml',
    ),
    val=dict(
        ann_file='merged_infos_navformer/nuplan_openscene_navtest.pkl',
        scenario_filter='configs/navsim_splits/navtest_split/navtest.yaml',
    ),
)

Runtime Overrides

./scripts/e2e_dist_train.sh configs/navformer/e2e_vadv2.py 8 \
    --cfg-options optimizer.lr=1e-4 total_epochs=30 data.samples_per_gpu=2

Model Architectures

Architecture	Config	Strengths
VADv2 (default)	`configs/navformer/e2e_vadv2.py`	Fast inference, general driving
HydraMDP	`configs/navformer/e2e_hydramdp.py`	Multi-modal planning, safety-critical

Advanced Training

Multi-Node Training

# Node 0 (master)
export MASTER_ADDR=192.168.1.100
export MASTER_PORT=28567
export WORLD_SIZE=16
export RANK=0
./scripts/e2e_dist_train.sh configs/navformer/e2e_vadv2.py 8

# Node 1 (worker)
export MASTER_ADDR=192.168.1.100
export MASTER_PORT=28567
export WORLD_SIZE=16
export RANK=8
./scripts/e2e_dist_train.sh configs/navformer/e2e_vadv2.py 8

Mixed Precision

# in config
fp16 = dict(loss_scale='dynamic')

Gradient Accumulation

# effective batch = samples_per_gpu * num_gpus * gradient_accumulation_steps
runner = dict(max_epochs=20, gradient_accumulation_steps=4)

Troubleshooting

CUDA out of memory:

# Reduce batch size: data.samples_per_gpu = 1
# Lower BEV resolution: bev_h_, bev_w_ = 150, 150
# Enable gradient checkpointing: model.img_backbone.with_cp = True

Training loss not decreasing:

grep "load checkpoint" experiments/navformer/*/logs/train.*
./scripts/e2e_dist_train.sh ... --cfg-options optimizer.lr=1e-4

Evaluation hangs:

ps aux | grep python
pkill -f "test.py"
./scripts/e2e_dist_eval.sh ... 4   # try fewer GPUs

ModuleNotFoundError: No module named mmdet3d:

conda activate navformer
python -c "import mmcv; print(mmcv.__version__)"
pip uninstall mmdet3d -y && pip install mmdet3d==1.0.0rc6

Corrupted checkpoint:

# Use a previous epoch
./scripts/e2e_dist_train.sh ... experiments/navformer/e2e_vadv2/epoch_18.pth

Performance Optimization

Training speed:

data.workers_per_gpu = 8 (if CPU/RAM allows)
Store data on NVMe SSD
fp16 = dict(loss_scale='dynamic')
data.persistent_workers = True

Memory:

data.samples_per_gpu = 1
bev_h_, bev_w_ = 150, 150
model.img_backbone.with_cp = True

Multi-node:

Use homogeneous GPU types across nodes
InfiniBand for inter-node communication
Shared NFS/Lustre for data loading

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
closed_loop		closed_loop
configs		configs
mmdet3d_plugin		mmdet3d_plugin
scripts		scripts
.gitignore		.gitignore
README.md		README.md
distributed_train.sh		distributed_train.sh
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation