Papers - Audio
updated
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper
• 2310.00704
• Published • 21
Structural Similarities Between Language Models and Neural Response
Measurements
Paper
• 2306.01930
• Published • 2
Streaming Transformer ASR with Blockwise Synchronous Beam Search
Paper
• 2006.14941
• Published • 2
NU-GAN: High resolution neural upsampling with GAN
Paper
• 2010.11362
• Published • 2
MusicHiFi: Fast High-Fidelity Stereo Vocoding
Paper
• 2403.10493
• Published • 18
A Multimodal Approach to Device-Directed Speech Detection with Large
Language Models
Paper
• 2403.14438
• Published • 2
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram
Predictions
Paper
• 1712.05884
• Published • 3
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Paper
• 2403.16973
• Published • 3
Masked Audio Generation using a Single Non-Autoregressive Transformer
Paper
• 2401.04577
• Published • 45
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Paper
• 2404.00656
• Published • 11
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting
for Text-to-Speech Synthesis
Paper
• 2404.03204
• Published • 10
Qwen-Audio: Advancing Universal Audio Understanding via Unified
Large-Scale Audio-Language Models
Paper
• 2311.07919
• Published • 10
Custom Data Augmentation for low resource ASR using Bark and
Retrieval-Based Voice Conversion
Paper
• 2311.14836
• Published • 2
MuPT: A Generative Symbolic Music Pretrained Transformer
Paper
• 2404.06393
• Published • 16
Audio Dialogues: Dialogues dataset for audio and music understanding
Paper
• 2404.07616
• Published • 16
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through
Direct Preference Optimization
Paper
• 2404.09956
• Published • 12
Long-form music generation with latent diffusion
Paper
• 2404.10301
• Published • 27
Paper
• 2404.13358
• Published • 14
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General
Sound
Paper
• 2405.00233
• Published • 17
LLM-AD: Large Language Model based Audio Description System
Paper
• 2405.00983
• Published • 22
Images that Sound: Composing Images and Sounds on a Single Canvas
Paper
• 2405.12221
• Published • 1
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation
in Videos
Paper
• 2406.08407
• Published • 28
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized
Sounds
Paper
• 2407.01494
• Published • 15
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of
Audio Events in Text-to-audio Generation
Paper
• 2407.02869
• Published • 21
FunAudioLLM: Voice Understanding and Generation Foundation Models for
Natural Interaction Between Humans and LLMs
Paper
• 2407.04051
• Published • 40
Qwen2-Audio Technical Report
Paper
• 2407.10759
• Published • 64
Audio Conditioning for Music Generation via Discrete Bottleneck Features
Paper
• 2407.12563
• Published • 7
Facing the Music: Tackling Singing Voice Separation in Cinematic Audio
Source Separation
Paper
• 2408.03588
• Published • 8
RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual
Dexterous Robot Hands
Paper
• 2408.11048
• Published • 4
Foundation Models for Music: A Survey
Paper
• 2408.14340
• Published • 44
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio
Language Modeling
Paper
• 2408.16532
• Published • 50
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech
Representations
Paper
• 2006.11477
• Published • 9