A fine-tuning and deployment framework for the NVIDIA GR00T N1.5 Vision-Language-Action (VLA) model, adapted for the U0 underwater robot. This project provides tools for data loading, LoRA/full fine-tuning, evaluation, and inference.
Note: This project is forked from NVIDIA Isaac-GR00T and customized for U0 robot applications.
- Fine-Tuning: LoRA and full fine-tuning support for GR00T N1.5 with multi-GPU training
- Evaluation: Policy evaluation with per-trajectory MSE metrics and visualization
- Inference: HTTP and ZMQ server modes for real-time robot deployment
- Data: Compatible with LeRobot data format
├── gr00t/ # Core library
│ ├── data/ # Dataset loading and transforms
│ ├── eval/ # Evaluation wrappers and services
│ ├── experiment/ # Training runner and data configs
│ ├── model/ # Model architecture (backbone, action head, policy)
│ └── utils/ # Utility functions
├── scripts/ # Training, evaluation, and inference scripts
├── deployment_scripts/ # TensorRT deployment tools (from upstream, experimental)
├── examples/ # Robot-specific modality configs
│ └── U0bot/ # U0 robot modality configurations
├── demo_data/ # Example dataset for quick start
└── tests/ # Unit tests
Clone the repo:
git clone https://github.com/VincentGu2000/u0model.git
cd u0modelCreate a new conda environment and install the dependencies. We recommend Python 3.10:
conda create -n gr00t python=3.10
conda activate gr00t
pip install --upgrade setuptools
pip install -e .[base]Install flash-attn module:
pip install --no-build-isolation flash-attn==2.7.1.post4Note: Make sure your CUDA version is 12.4. Otherwise, you may have trouble properly configuring the flash-attn module. For GPUs with sm_120 like RTX PRO 6000 Blackwell, try CUDA 13.0. If you encounter issues, try the following solutions.
For CUDA 13.0:
pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu130 --extra-index-url https://pypi.tuna.tsinghua.edu.cn/simple
pip install https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.9.0/flash_attn-2.8.3+cu130torch2.10-cp310-cp310-linux_x86_64.whlFor CUDA 12.4:
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.1.post4/flash_attn-2.7.1.post4+cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whlInstalling CUDA 12.4 (optional):
wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda_12.4.0_550.54.14_linux.run
sudo sh cuda_12.4.0_550.54.14_linux.runIn the TUI interface, accept the EULA, uncheck "Driver" (keep it unselected) if you already have the driver, ensure that "CUDA Toolkit" is selected, and keep the installation path as default (/usr/local/cuda-12.4).
This project uses environment variables for all file paths (model weights, datasets, etc.) so that the same commands work across different machines.
# Copy the example config and edit with your actual paths
cp .env.example .env
# Then edit .env, for example:
# MODEL_BASE_DIR=/data/models
# DATA_BASE_DIR=/data/datasets
# ROS_WS_DIR=/home/user/ros_wsBefore running any command in this README, load your path configuration:
source .envTip: If you use
makecommands, theMakefilewill automatically load.envfor you.
Choose one of the following options based on your needs:
If you do not need to fine-tune the model yourself, you can directly download our fine-tuned U0 weights from Hugging Face:
hf download Vincent2025hello/u0_final --local-dir $MODEL_BASE_DIR/u0_final --local-dir-use-symlinks FalseAfter downloading, simply point --model-path to $MODEL_BASE_DIR/u0_final in the evaluation and inference steps below.
If you want to fine-tune the model yourself, download the NVIDIA GR00T N1.5 pretrained weights first:
hf download nvidia/GR00T-N1.5-3B --local-dir $MODEL_BASE_DIR/GR00T-N1.5-3B --local-dir-use-symlinks FalseNote: Self fine-tuning requires GPU resources and training data. See the 5. Fine-Tuning section below for details.
We also open-sourced the USIM training dataset on Hugging Face. If you plan to fine-tune the model, you can download it with:
hf download Vincent2025hello/usim --local-dir $DATA_BASE_DIR/usim --local-dir-use-symlinks FalseTip: The
demo_data/directory in this repo contains a small sample dataset for quick testing. For full training, use the USIM dataset above.
For a full list of options, run:
python scripts/gr00t_finetune.py --helpRun the fine-tuning demo on a single GPU server with the following configuration:
source .env
python scripts/gr00t_finetune.py \
--dataset-path ./demo_data/robot_sim.PickNPlace \
--output-dir $MODEL_BASE_DIR/finetuned-model-demo \
--base-model-path $MODEL_BASE_DIR/GR00T-N1.5-3B \
--num-gpus 1 \
--max-steps 500 \
--batch-size 64 \
--lora-rank 64 \
--lora_alpha 128 \
--data-config fourier_gr1_arms_only \
--report-to tensorboard \
--target-loss-weight 0Note: When
lora-rankis greater than 0, LoRA fine-tuning mode is enabled. A largerlora-rankvalue means more trainable parameters and potentially better performance.
Create a training session:
tmux new -s my_training "source .env && source ~/miniconda3/bin/activate gr00t && python scripts/gr00t_finetune.py \
--dataset-path $DATA_BASE_DIR/usim/train \
--output-dir $MODEL_BASE_DIR/finetuned-model-lora \
--base-model-path $MODEL_BASE_DIR/GR00T-N1.5-3B \
--num-gpus 4 \
--max-steps 10000 \
--save-steps 5000 \
--batch-size 32 \
--lora_rank 64 \
--lora_alpha 128 \
--data-config u0_bot \
--report-to tensorboard"Create a training session:
tmux new -s my_training "source .env && source ~/miniconda3/bin/activate gr00t && python scripts/gr00t_finetune.py \
--dataset-path $DATA_BASE_DIR/usim/train \
--output-dir $MODEL_BASE_DIR/finetuned-model-full \
--base-model-path $MODEL_BASE_DIR/GR00T-N1.5-3B \
--num-gpus 2 \
--max-steps 22000 \
--save-steps 11000 \
--batch-size 32 \
--tune-visual \
--data-config u0_bot \
--report-to tensorboard"Reattach to the training session:
tmux attach -t my_trainingView training logs:
tensorboard --logdir="$MODEL_BASE_DIR/"For a full list of options, run:
python scripts/eval_policy.py --helpEvaluate on the entire test dataset (automatically iterates all trajectories and computes per-trajectory steps):
source .env
python scripts/eval_policy.py \
--model-path $MODEL_BASE_DIR/u0_final \
--dataset-path $DATA_BASE_DIR/usim/test \
--data-config u0_bot \
--embodiment-tag new_embodiment \
--modality-keys joint_pos pwm \
--video-backend torchvision_av \
--save-csv-path results/eval_results_u0_final.csvEvaluate specific trajectories with fixed steps:
source .env
python scripts/eval_policy.py \
--model-path $MODEL_BASE_DIR/u0_final \
--dataset-path $DATA_BASE_DIR/usim/test \
--data-config u0_bot \
--embodiment-tag new_embodiment \
--modality-keys joint_pos pwm \
--video-backend torchvision_av \
--save-csv-path results/eval_results_u0_final.csv \
--save-plot-path results/plots \
--action_horizon 16 \
--trajs 5Note: When
--stepsand--trajsare omitted, the script automatically evaluates all trajectories and usestrajectory_length - action_horizonas the step count for each trajectory. Use--save-csv-pathto save per-trajectory results (traj_id, traj_length, eval_steps, action_mse, target_mse) to a CSV file.
For automated evaluation pipelines, please refer to the u0env project. To run parallel evaluation across multiple GPUs, use the following script:
python scripts/launch_multi_gpu.py \
--num-instances 5 \
--base-port 8000 \
--gpus 1,2,3 \
--model-path $MODEL_BASE_DIR/u0_final/ \
--data-config u0_bot \
--embodiment-tag new_embodiment \
--host 0.0.0.0For a full list of options, run:
python scripts/launch_multi_gpu.py --helpPress Ctrl+C to stop all instances simultaneously.
First, start the ROS environment:
conda activate ros_env
cd $ROS_WS_DIR
source devel/setup.bash
rosrun bluerov2_control mapper_setup.py shallowfixed
roslaunch stonefish_bluerov2 bluerov2_eval.launchFor a full list of options, run:
python scripts/inference_service_u0.py --helpStart the inference service (HTTP mode):
conda activate gr00t
pip install uvicorn fastapi json-numpy requests
source .env
python scripts/inference_service_u0.py \
--data-config u0_bot \
--embodiment-tag new_embodiment \
--device cuda:0 \
--server --http-server --host 0.0.0.0 --port 8000 \
--model-path $MODEL_BASE_DIR/u0_final/Publish task description:
rostopic pub /task_description std_msgs/String "Pick up the red cylinder."The deployment_scripts/ directory contains TensorRT deployment tools inherited from the upstream project. These scripts are experimental and have not been fully verified in the U0 robot setup. Use at your own discretion. See deployment_scripts/README.md for details.
If you use this work, please cite:
@misc{gu2025usimu0visionlanguageactiondataset,
title={USIM and U0: A Vision-Language-Action Dataset and Model for General Underwater Robots},
author={Junwen Gu and Zhiheng Wu and Pengxuan Si and Shuang Qiu and Yukai Feng and Luoyang Sun and Laien Luo and Lianyi Yu and Jian Wang and Zhengxing Wu},
year={2025},
eprint={2510.07869},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2510.07869},
}This project is built on top of NVIDIA Isaac-GR00T. We thank the NVIDIA GEAR team for open-sourcing the GR00T model and framework.
This project is licensed under the Apache License 2.0.