AUTOMATIC IDENTIFICATION OF SENTENCE COMPONENTS IN THE UZBEK LANGUAGE USING THE HIDDEN MARKOV MODEL
DOI:
https://doi.org/10.37547/Keywords:
NLP, HMM, Viterbi algorithm, BIO chunking, syntactic analysis, Uzbek language, machine learning.Abstract
This article investigates the application of the Hidden Markov Model (HMM) and the Viterbi algorithm in the automatic syntactic analysis of the Uzbek language. Uzbek belongs to the agglutinative group of the Turkic language family and has a distinctive syntactic structure. In this study, sentence components are labeled using the BIO (Begin, Inside, Outside) tagging scheme within a syntactic analysis system, and the statistical probabilities of the model are examined. The results demonstrate the effectiveness as well as the limitations of HMM in the syntactic analysis of the Uzbek language.
Downloads
References
1.Madatov A. “Morphological Analysis of the Uzbek Language for NLP Tasks.” Journal of Central Asian Studies, 2023.
2.Jurafsky D., Martin J. H. Speech and Language Processing. Stanford University, 2021.
3.Ziyayev A. “BIO Tagging Approach in Uzbek Syntax.” Journal of Computational Linguistics, 2022.
4.Rabiner L. R. “A Tutorial on Hidden Markov Models.” Proceedings of the IEEE, 1989.
5.Po‘latov A. Computational Linguistics. Tashkent, 2009.
6.Sultonov B. “Probabilistic Models for Sentence Component Identification in Uzbek.” Tashkent: Proceedings of UzMU, 2023.
7.Manning C. D., Schutze H. Foundations of Statistical Natural Language Processing. MIT Press, 1999.
8.Sayfullayeva R. Modern Uzbek Literary Language. Tashkent: O‘qituvchi, 2007.
9.Nurmonov A. System Linguistics and Its Foundations. Tashkent: Uzbek National Encyclopedia, 2010.
10.G‘ulomov A., Asqarova M. Modern Uzbek Language: Sentence Components and Their Functions. Tashkent: Fan Publishing, 1985..
