WaveLLDM

A research model for efficient speech denoising and restoration using a compressed latent space via a diffusion approach—faster and less resource-intensive than many waveform-based methods.

Free Plan Available Starts at

About WaveLLDM

A neural audio codec (called FireflyGAN) that encodes audio into a compressed latent space and decodes it back. It is a latent diffusion model (LDM) that works in that latent space for tasks like denoising, audio inpainting (i.e. filling missing segments), and restoring degraded speech. The model achieves low Log-Spectral Distance (LSD) scores (≈ 0.48-0.60) indicating good spectral reconstruction. However, perceptual quality (as measured by WB-PESQ) is modest (≈ 1.62-1.71), and speech intelligibility (STOI) also moderate (≈ 0.76-0.78)—i.e. not yet on par with some state-of-the-art methods in those metrics.