CosyAudio

A research framework for text-to-audio generation that uses synthetic captions + confidence scoring to filter noisy data and improve the quality and faithfulness of generated audio.

Free Plan Available Starts at

About CosyAudio

CosyAudio is a framework designed to improve text-to-audio (TTA) generation by leveraging synthetic captions and confidence scores. The motivation is that many large audio datasets are weakly labeled or have noisy/inaccurate captions, which degrades the performance of audio generation models.