Система автоматизованого озвучування з елементами штучного інтелекту – Вісник Хмельницького національного університету

СИСТЕМА АВТОМАТИЗОВАНОГО ОЗВУЧУВАННЯ З ЕЛЕМЕНТАМИ ШТУЧНОГО ІНТЕЛЕКТУ

THE AUTOMATED VOICING SYSTEM WITH ELEMENTS OF ARTIFICIAL INTELLIGENCE

Сторінки: 115-119. Номер: №3, 2023 (321)
Автори:
ДУМИН Андрій
Національний університет «Львівська політехніка»
ORCID ID: 0000-0003-2111-2899
e-mail: andrii.r.dumyn@lpnu.ua
DUMYN ANDRII
Lviv Polytechnic National University
DOI: https://www.doi.org/10.31891/2307-5732-2023-321-3-115-119

Анотація мовою оригіналу

Автоматичне озвучування текстів вже давно не є новинкою серед користувачів. Проте при автоматизованому озвучуванні художніх текстів або при автоматизованому переозвучуванні аудіо з інших мов втрачається емоційна складова. Емоційне перетворення голосу з урахуванням статі мовця, особливостей мовлення тощо має на меті зберегти мовний зміст та ідентичність мовця. У цій статі запропоновано архітектуру системи автоматизованого переозвучування аудіо та відео з вбудованими класифікаторами визначення тональності тексу, емоційного забарвлення мовця та модуля опрацювання метаданих мовця для збереження його ідентичності. Розроблена архітектура стане основою подальших досліджень за даною тематикою.
Ключові слова: ASR, автоматичне розпізнавання мовлення, розпізнавання емоцій, перетворення тексту в мовлення, перетворення мовлення в текст, аналіз голосу.

Розширена анотація англійською мовою

Automatic voicing of texts has not been a novelty among users for a long time. However, the emotional component is lost during the automated dubbing of artistic texts or audio from other languages. The emotional transformation of the voice, considering the gender of the speaker, features of speech, etc., aims to preserve the linguistic meaning and identity of the speaker. This work provides an overview of the latest research in the field of automated voicing, obtaining metadata from audio, and proposes a general architecture for an automated voicing system with elements of artificial intelligence, such as a classifier for determining the emotional coloring of speech, models for determining gender and speech features. The obtained work results will form the basis of further research in developing a group of classifiers for determining the emotional coloring of speech, gender, age, and features of human speech. Based on the proposed architecture, the corresponding system’s design and development are planned. The proposed system will significantly simplify the consumption of foreign language content for users from different countries, regardless of the level of proficiency in one or another language. For this reason, automated translation and voiceover systems are widespread. However, the speaker’s emotional component and other features need to be recovered during automated dubbing of texts or audio from other languages. That is why the automated voicing of texts or dubbing of audio or video will be relevant, taking into account the gender of the speaker, his age, emotional coloring and other features of speech. Such a system will simplify the process of adapting audio and video content to the users of one or another country. It will help make a large part of exciting content available to users. In education, this system can be used as an auxiliary tool when viewing lectures or parts of lectures from foreign lecturers, significantly expanding students’ access to educational materials.
Keywords: Automatic Speech Recognition, emotional recognition, voice, Text-to-Speech, Speech-to-Text, voice analysis.

Post Author: Горященко Сергій