Academic Journal

Voice cloning with LSB steganography

التفاصيل البيبلوغرافية
العنوان: Voice cloning with LSB steganography
المؤلفون: Peddarapu Rama Krishna, Kadarla Anusree, Tumu Sai Pranav, Vadthya Deepak, Varakantham Dilip Kumar
المصدر: The International Journal of Science, Mathematics and Technology Learning (IJSMTL), 31(1), (2024-05-07)
بيانات النشر: CGRN
سنة النشر: 2024
المجموعة: Zenodo
مصطلحات موضوعية: Voice replication, Generative Adversarial Networks (GAN), HUBERT, BARK, Steganography, Gradio
الوصف: As The advancement of computer technology at a quick pace has resulted in voice reproduction becoming a significant problem in deep learning. In this paper, we present an advanced Speech Synthesis model which enables the production of expressive and natural-sounding speech based on the conditioning variables. The speech converter and the text-to-voice synthesizer are the two elements constructed using this method. Voice conversion refers a re-production of speaker stylistics from the source voice like pitch, prosody, and frequency to the target voice in order to remake the synthesis that sounds natural. The text-to-voice synthesizer employs vibra-voice, which works in an unsupervised way through obtaining speech representations which are robust from audio using the self-supervised model HuBERT. The quantized semantic embeddings are fed into the generative adversarial networks (e.g. BARK), which generate speech that sounds natural and culturally appropriate for many languages. What is more, it employs STEGANOGRAPHY as a mean to tell real and the fake samples apart by cloaking hidden signatures in the audio output. Users can find it easy to run voice cloning with the help of the inter-face of the project called Gradio. Second of all, it relies on the GPU, which brings a lot of benefits such as great voice copying efficiency and high speed of the process. The text-to-voice synthesizers come in handy in different applications, such as vocal annotations for virtual assistants or accessibility tools, speech transformation for dubbing and imitation as well as a fake analyzer for checking audio material integrity. At present the learnt transferring is added to make the multilingual assistance, and the issues related to deepfake , synthesized media are examined morally.The main goal is to enhance interaction between human and machine communication because it makes itmore natural and engaging.
نوع الوثيقة: article in journal/newspaper
اللغة: unknown
Relation: https://doi.org/10.5281/zenodo.11175296; https://doi.org/10.5281/zenodo.11175297; oai:zenodo.org:11175297
DOI: 10.5281/zenodo.11175297
الاتاحة: https://doi.org/10.5281/zenodo.11175297
Rights: info:eu-repo/semantics/openAccess ; Creative Commons Attribution 4.0 International ; https://creativecommons.org/licenses/by/4.0/legalcode
رقم الانضمام: edsbas.F12B81E3
قاعدة البيانات: BASE