HAFM: Hierarchical Autoregressive Foundation Model for Music Accompaniment Generation
Authors: Jian Zhu, Jianwei Cui, Shihao Chen, Yubang Zhang, Yunlong Xue, Cheng Luo.
This repo contains the code and data of Hierarchical Autoregressive Foundation Model.
We present HAFM, a system that generates instrumental music audio to accompany input vocals. Given isolated singing voice, HAFM produces a coherent instrumental accompaniment that can be directly mixed with the input to create complete music. We propose three key innovations over prior work: (1) a dual-rate codec tokenization scheme using HuBERT semantic tokens at 50,Hz for vocals and EnCodec acoustic tokens at 75,Hz for instrumentals, enabling time-aligned yet rate-independent modeling; (2) a three-stage hierarchical autoregressive architecture (semantic
https://huggingface.co/zhuqijian/HAFM
python infer_simple.py
--vocal_path vocal.wav
--output_path output.wav
--config configs/ar.yaml
If you have any problems, contact me via qijian.zhu@outlook.com.
