Система для анотування тексту українською мовою – Вісник Хмельницького національного університету

СИСТЕМА ДЛЯ АНОТУВАННЯ ТЕКСТУ УКРАЇНСЬКОЮ МОВОЮ

SYSTEM FOR ANNOTATING TEXT IN UKRAINIAN

Сторінки: 200-203. Номер: №5,т.2 2023 (327)
Автори:
ЦАП ВЛАДИСЛАВ
Національний університет «Львівська політехніка»
https://orcid.org/0000-0002-8062-0079
e-mail: vladyslav.b.tsap@lpnu.ua
БРУСЕНЦОВ ГЕОРГІЙ
Національний університет «Львівська політехніка»
https://orcid.org/0000-0002-3346-0164
e-mail: heorhii.y.brusentsov@lpnu.ua
TSAP VLADYSLAV, BRUSENTSOV HEORHII
Lviv Polytechnic National University
DOI: https://www.doi.org/10.31891/2307-5732-2023-327-5-192-199

Анотація мовою оригіналу

Ця наукова робота присвячена розробці ефективної системи, призначеної для анотування текстів українською мовою. Основна мета полягає у створенні системи, здатної аналізувати введені українські тексти, виокремлювати ключові поняття та генерувати інформативні анотації. Дослідження зосереджене на області анотування текстів українською мовою, вивчаючи два різні підходи: один передбачає інтеграцію моделі Pegasus з перекладачем, а інший використовує модель mT5, спеціально адаптовану для завдань анотування українською мовою. У дослідженні здійснено комплексне оцінювання цих підходів з особливим акцентом на їхню швидкість роботи та показники ефективності. Воно підкреслює специфіку роботи спеціалізованих моделей для ефективної обробки деталей української мови. У статті також підкреслюється важливість суб’єктивних оцінок для визначення ефективності системи у передачі основних ідей вихідного тексту. Таким чином, ця наукова стаття робить свій внесок у розвиток української системи обробки природної мови, пропонуючи нові підходи до анотування текстів. Вона підкреслює виклики та можливості в цій галузі, наголошуючи на важливості спеціалізованих моделей та суб’єктивних оцінок для досягнення точного та контекстуально релевантного анотування тексту українською мовою.
Ключові слова: узагальнення тексту, обробка тексту українською, обробка природної мови

Розширена анотація англійською мовою

This research paper delves into the realm of text annotation within the Ukrainian language, aiming to devise a highly efficient system that can adeptly process Ukrainian texts, extracting salient concepts and crafting informative annotations. Within this context, our study predominantly explores two distinct approaches: one revolves around the fusion of the Pegasus model with a translation mechanism, while the other leverages the mT5 model, fine-tuned to cater specifically to the intricacies of Ukrainian annotation tasks.
The primary objective of this research is to evaluate these approaches in depth, with a particular emphasis on their speed and performance metrics. Speed is of paramount importance in the era of real-time data processing, especially when dealing with large volumes of text.
In the age of NLP (Natural Language Processing), language-specific models are instrumental in ensuring the robust handling of linguistic nuances. Ukrainian, with its unique phonology, grammar, and vocabulary, presents its own set of challenges. It is imperative, therefore, that models tailored to the specifics of the Ukrainian language are developed and integrated into annotation systems.
Furthermore, we emphasize the significance of subjective evaluations in this research. While quantitative metrics provide a valuable foundation for measuring system performance, subjective assessments offer a more holistic view. Human evaluation helps in gauging the system’s ability to capture the essence of a text, convey its main ideas, and maintain the contextual relevance of annotations. These subjective evaluations serve as a bridge between machine-driven performance metrics and the actual utility of the system in real-world applications.
In conclusion, this research article not only advances the field of Ukrainian natural language processing but also offers novel methodologies for text annotation. It sheds light on the challenges and opportunities that arise when working with a language as intricate as Ukrainian.
Keywords: text summarization, Ukrainian text processing, natural language processing.

Post Author: Горященко Сергій