التفاصيل البيبلوغرافية
العنوان: |
Voice cloning with LSB steganography |
المؤلفون: |
Peddarapu Rama Krishna, Kadarla Anusree, Tumu Sai Pranav, Vadthya Deepak, Varakantham Dilip Kumar |
المصدر: |
The International Journal of Science, Mathematics and Technology Learning (IJSMTL), 31(1), (2024-05-07) |
بيانات النشر: |
CGRN |
سنة النشر: |
2024 |
المجموعة: |
Zenodo |
مصطلحات موضوعية: |
Voice replication, Generative Adversarial Networks (GAN), HUBERT, BARK, Steganography, Gradio |
الوصف: |
As The advancement of computer technology at a quick pace has resulted in voice reproduction becoming a significant problem in deep learning. In this paper, we present an advanced Speech Synthesis model which enables the production of expressive and natural-sounding speech based on the conditioning variables. The speech converter and the text-to-voice synthesizer are the two elements constructed using this method. Voice conversion refers a re-production of speaker stylistics from the source voice like pitch, prosody, and frequency to the target voice in order to remake the synthesis that sounds natural. The text-to-voice synthesizer employs vibra-voice, which works in an unsupervised way through obtaining speech representations which are robust from audio using the self-supervised model HuBERT. The quantized semantic embeddings are fed into the generative adversarial networks (e.g. BARK), which generate speech that sounds natural and culturally appropriate for many languages. What is more, it employs STEGANOGRAPHY as a mean to tell real and the fake samples apart by cloaking hidden signatures in the audio output. Users can find it easy to run voice cloning with the help of the inter-face of the project called Gradio. Second of all, it relies on the GPU, which brings a lot of benefits such as great voice copying efficiency and high speed of the process. The text-to-voice synthesizers come in handy in different applications, such as vocal annotations for virtual assistants or accessibility tools, speech transformation for dubbing and imitation as well as a fake analyzer for checking audio material integrity. At present the learnt transferring is added to make the multilingual assistance, and the issues related to deepfake , synthesized media are examined morally.The main goal is to enhance interaction between human and machine communication because it makes itmore natural and engaging. |
نوع الوثيقة: |
article in journal/newspaper |
اللغة: |
unknown |
Relation: |
https://doi.org/10.5281/zenodo.11175296; https://doi.org/10.5281/zenodo.11175297; oai:zenodo.org:11175297 |
DOI: |
10.5281/zenodo.11175297 |
الاتاحة: |
https://doi.org/10.5281/zenodo.11175297 |
Rights: |
info:eu-repo/semantics/openAccess ; Creative Commons Attribution 4.0 International ; https://creativecommons.org/licenses/by/4.0/legalcode |
رقم الانضمام: |
edsbas.F12B81E3 |
قاعدة البيانات: |
BASE |