PAM: Prompting Audio-Language Models for Audio Quality Assessment

التفاصيل البيبلوغرافية
العنوان:	PAM: Prompting Audio-Language Models for Audio Quality Assessment
المؤلفون:	Deshmukh, Soham, Alharthi, Dareen, Elizalde, Benjamin, Gamper, Hannes, Ismail, Mahmoud Al, Singh, Rita, Raj, Bhiksha, Wang, Huaming
سنة النشر:	2024
المجموعة:	Computer Science
مصطلحات موضوعية:	Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
الوصف:	While audio quality is a key performance metric for various audio processing tasks, including generative modeling, its objective measurement remains a challenge. Audio-Language Models (ALMs) are pre-trained on audio-text pairs that may contain information about audio quality, the presence of artifacts, or noise. Given an audio input and a text prompt related to quality, an ALM can be used to calculate a similarity score between the two. Here, we exploit this capability and introduce PAM, a no-reference metric for assessing audio quality for different audio processing tasks. Contrary to other "reference-free" metrics, PAM does not require computing embeddings on a reference dataset nor training a task-specific model on a costly set of human listening scores. We extensively evaluate the reliability of PAM against established metrics and human listening scores on four tasks: text-to-audio (TTA), text-to-music generation (TTM), text-to-speech (TTS), and deep noise suppression (DNS). We perform multiple ablation studies with controlled distortions, in-the-wild setups, and prompt choices. Our evaluation shows that PAM correlates well with existing metrics and human listening scores. These results demonstrate the potential of ALMs for computing a general-purpose audio quality metric.
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2402.00282
رقم الانضمام:	edsarx.2402.00282
قاعدة البيانات:	arXiv

View record in Arxiv

الوصف
الوصف غير متاح.