An open-source audio-language model by NVIDIA for in-depth audio understanding and reasoning (speech, sound, music), supporting long audio, multi-turn chat, and voice-to-voice interaction.
Audio Flamingo 3 (AF3) is a Large Audio-Language Model (LALM) developed by NVIDIA ADLR. It is open-source, designed for advanced understanding and reasoning over multiple audio modalities (speech, sound effects, music), across long contexts, and supports interactive capabilities.