Effects of Convolutional Autoencoder Bottleneck Width on StarGAN-based Singing Technique Conversion

التفاصيل البيبلوغرافية
العنوان: Effects of Convolutional Autoencoder Bottleneck Width on StarGAN-based Singing Technique Conversion
المؤلفون: Su, Tung-Cheng, Chang, Yung-Chuan, Liu, Yi-Wen
سنة النشر: 2023
المجموعة: Computer Science
مصطلحات موضوعية: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Machine Learning, Computer Science - Sound
الوصف: Singing technique conversion (STC) refers to the task of converting from one voice technique to another while leaving the original singer identity, melody, and linguistic components intact. Previous STC studies, as well as singing voice conversion research in general, have utilized convolutional autoencoders (CAEs) for conversion, but how the bottleneck width of the CAE affects the synthesis quality has not been thoroughly evaluated. To this end, we constructed a GAN-based multi-domain STC system which took advantage of the WORLD vocoder representation and the CAE architecture. We varied the bottleneck width of the CAE, and evaluated the conversion results subjectively. The model was trained on a Mandarin dataset which features four singers and four singing techniques: the chest voice, the falsetto, the raspy voice, and the whistle voice. The results show that a wider bottleneck corresponds to better articulation clarity but does not necessarily lead to higher likeness to the target technique. Among the four techniques, we also found that the whistle voice is the easiest target for conversion, while the other three techniques as a source produce more convincing conversion results than the whistle.
Comment: The original edition of this paper will be published in the CMMR 2023 Proceedings. This ArXiv publication is a copy
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2308.10021
رقم الانضمام: edsarx.2308.10021
قاعدة البيانات: arXiv
ResultId 1
Header edsarx
arXiv
edsarx.2308.10021
1065
3
Report
report
1065.2451171875
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsarx&AN=edsarx.2308.10021&custid=s6537998&authtype=sso
FullText Array ( [Availability] => 0 )
Array ( [0] => Array ( [Url] => http://arxiv.org/abs/2308.10021 [Name] => EDS - Arxiv [Category] => fullText [Text] => View record in Arxiv [MouseOverText] => View record in Arxiv ) )
Items Array ( [Name] => Title [Label] => Title [Group] => Ti [Data] => Effects of Convolutional Autoencoder Bottleneck Width on StarGAN-based Singing Technique Conversion )
Array ( [Name] => Author [Label] => Authors [Group] => Au [Data] => <searchLink fieldCode="AR" term="%22Su%2C+Tung-Cheng%22">Su, Tung-Cheng</searchLink><br /><searchLink fieldCode="AR" term="%22Chang%2C+Yung-Chuan%22">Chang, Yung-Chuan</searchLink><br /><searchLink fieldCode="AR" term="%22Liu%2C+Yi-Wen%22">Liu, Yi-Wen</searchLink> )
Array ( [Name] => DatePubCY [Label] => Publication Year [Group] => Date [Data] => 2023 )
Array ( [Name] => Subset [Label] => Collection [Group] => HoldingsInfo [Data] => Computer Science )
Array ( [Name] => Subject [Label] => Subject Terms [Group] => Su [Data] => <searchLink fieldCode="DE" term="%22Electrical+Engineering+and+Systems+Science+-+Audio+and+Speech+Processing%22">Electrical Engineering and Systems Science - Audio and Speech Processing</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Science+-+Machine+Learning%22">Computer Science - Machine Learning</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Science+-+Sound%22">Computer Science - Sound</searchLink> )
Array ( [Name] => Abstract [Label] => Description [Group] => Ab [Data] => Singing technique conversion (STC) refers to the task of converting from one voice technique to another while leaving the original singer identity, melody, and linguistic components intact. Previous STC studies, as well as singing voice conversion research in general, have utilized convolutional autoencoders (CAEs) for conversion, but how the bottleneck width of the CAE affects the synthesis quality has not been thoroughly evaluated. To this end, we constructed a GAN-based multi-domain STC system which took advantage of the WORLD vocoder representation and the CAE architecture. We varied the bottleneck width of the CAE, and evaluated the conversion results subjectively. The model was trained on a Mandarin dataset which features four singers and four singing techniques: the chest voice, the falsetto, the raspy voice, and the whistle voice. The results show that a wider bottleneck corresponds to better articulation clarity but does not necessarily lead to higher likeness to the target technique. Among the four techniques, we also found that the whistle voice is the easiest target for conversion, while the other three techniques as a source produce more convincing conversion results than the whistle.<br />Comment: The original edition of this paper will be published in the CMMR 2023 Proceedings. This ArXiv publication is a copy )
Array ( [Name] => TypeDocument [Label] => Document Type [Group] => TypDoc [Data] => Working Paper )
Array ( [Name] => URL [Label] => Access URL [Group] => URL [Data] => <link linkTarget="URL" linkTerm="http://arxiv.org/abs/2308.10021" linkWindow="_blank">http://arxiv.org/abs/2308.10021</link> )
Array ( [Name] => AN [Label] => Accession Number [Group] => ID [Data] => edsarx.2308.10021 )
RecordInfo Array ( [BibEntity] => Array ( [Subjects] => Array ( [0] => Array ( [SubjectFull] => Electrical Engineering and Systems Science - Audio and Speech Processing [Type] => general ) [1] => Array ( [SubjectFull] => Computer Science - Machine Learning [Type] => general ) [2] => Array ( [SubjectFull] => Computer Science - Sound [Type] => general ) ) [Titles] => Array ( [0] => Array ( [TitleFull] => Effects of Convolutional Autoencoder Bottleneck Width on StarGAN-based Singing Technique Conversion [Type] => main ) ) ) [BibRelationships] => Array ( [HasContributorRelationships] => Array ( [0] => Array ( [PersonEntity] => Array ( [Name] => Array ( [NameFull] => Su, Tung-Cheng ) ) ) [1] => Array ( [PersonEntity] => Array ( [Name] => Array ( [NameFull] => Chang, Yung-Chuan ) ) ) [2] => Array ( [PersonEntity] => Array ( [Name] => Array ( [NameFull] => Liu, Yi-Wen ) ) ) ) [IsPartOfRelationships] => Array ( [0] => Array ( [BibEntity] => Array ( [Dates] => Array ( [0] => Array ( [D] => 19 [M] => 08 [Type] => published [Y] => 2023 ) ) ) ) ) ) )
IllustrationInfo