A NEW COMPUTATIONAL MODEL FOR TURKIC LANGUAGES MORPHOLOGY AND PROCESSING

Abstract

Abstract. Effective communication between representatives of different nations in the modern global world has become a very relevant problem. Towards its solution, considerable support can come from artificial intelligence tools and, in particular, from natural language processing components. Along this direction, this article proposes the development and the exploitation of new computational morphology model for Turkic languages, based on a complete set of endings (CSE - model). Based on the CSE-model of morphology, a methodology has been developed for the creation and use of universal programs (data-driven) for processing natural languages. These include word stemming, text segmentation and morphological analysis. One advantage of the proposed methodology is that it is oriented towards linguists that only have to prepare i) a list of complete sets of endings for new languages ​​according to the described method, and ii) a list of stop words that do not have endings. Then, based on the prepared lists, the developed universal programs for stemming, segmentation, morphological analysis are used. Experiments carried out for the Kazakh, Kyrgyz and Uzbek languages ​​show a high efficiency of the proposed morphology model, algorithms and tools.

Author Biography

Ualsher Tukeyev, Al-Farabi Kazakh National University, Almaty, Kazakhstan
Published
2023-04-03
How to Cite
TUKEYEV, Ualsher. A NEW COMPUTATIONAL MODEL FOR TURKIC LANGUAGES MORPHOLOGY AND PROCESSING. Journal of problems in computer science and information technologies, [S.l.], v. 1, n. 1, apr. 2023. ISSN 2958-0846. Available at: <https://dslib.kaznu.kz/index.php/kaznu/article/view/JPCSIT.2023.v1.i1.07>. Date accessed: 24 nov. 2024. doi: https://doi.org/10.26577/JPCSIT.2023.v1.i1.07.