AudioGen-Omni

An advanced AI model that generates speech, song, or general audio synchronized with video or text input; supports multimodal inputs and efficient generation with strong lip-sync and alignment.

Free Plan Available Starts at

Visit Website View Alternative

About AudioGen-Omni

AudioGen-Omni is a unified multimodal diffusion transformer model (called MMDit), developed by researchers at China University of Mining and Technology & Kuaishou Technology. Its goal is to generate high-fidelity audio, speech, and songs that are synchronized with input video, text, or both.

AudioGen-Omni

About AudioGen-Omni

Categories