This page shows inference results for real‑world musical audio samples.

The model was trained on additional data beyond SLakh, MUSDB, and MoisesDB.

MGE-LDM is trained to generated 3-track audio:
- $p_\theta(\text{mix}, \text{submix},\text{source})$

Limitations:

The model frequently struggles to isolate some instruments such as piano and guitar, especially when only single notes are played.
The model may produce spurious (hallucinated) audio components that do not exist in the original mixture.

Source Extraction

We set the diffusion steps to 250 and the classifier‑free guidance scale to 10.0 (if not specified) for source extraction

BRUNO MARS - 24k Magic

CHARLIE PUTH - Attention

ATLXS - PASSO BEM SOLTO

BIGBANG - HARUHARU

NELL - 1:03

SEVENTEEN - Don’t Wanna Cry

SAKANACTION - Music

VAUNDY - Kaiju no Hanauta

SAWANO HIROYUKI - ətˈæk 0N tάɪtn