Skip to content

HackerHyper/HAFM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

HAFM

HAFM: Hierarchical Autoregressive Foundation Model for Music Accompaniment Generation

Authors: Jian Zhu, Jianwei Cui, Shihao Chen, Yubang Zhang, Yunlong Xue, Cheng Luo.

This repo contains the code and data of Hierarchical Autoregressive Foundation Model.

1. Abstract

We present HAFM, a system that generates instrumental music audio to accompany input vocals. Given isolated singing voice, HAFM produces a coherent instrumental accompaniment that can be directly mixed with the input to create complete music. We propose three key innovations over prior work: (1) a dual-rate codec tokenization scheme using HuBERT semantic tokens at 50,Hz for vocals and EnCodec acoustic tokens at 75,Hz for instrumentals, enabling time-aligned yet rate-independent modeling; (2) a three-stage hierarchical autoregressive architecture (semantic $\rightarrow$ coarse acoustic $\rightarrow$ fine acoustic) with interleaved multi-codebook prediction and classifier-free guidance; and (3) modern Transformer design choices including QK-norm, GEGLU activations, RMSNorm, and T5-style relative position bias for improved training stability and sequence generalization. Experiments on MUSDB18 demonstrate that HAFM achieves a Fr'{e}chet Audio Distance (FAD) of 2.08 on isolated vocal inputs, outperforming retrieval baselines and matching prior state-of-the-art systems with fewer parameters. The source code is available at https://github.com/HackerHyper/HAFM.

2. ARCH

3. Model

https://huggingface.co/zhuqijian/HAFM

4. infer( semantic + coarse + fine checkpoint)

python infer_simple.py
--vocal_path vocal.wav
--output_path output.wav
--config configs/ar.yaml

5. Demo

播放音频

If you have any problems, contact me via qijian.zhu@outlook.com.

About

HAFM: Hierarchical Autoregressive Foundation Model for Music Accompaniment Generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages