KarRC RAS. News

KarRC RAS
in social media

News

April 18, 2024

Veps and Karelian open corpus VepKar is an essential tool for preserving the ethnic languages

Fundamental problems of linguistics and the tasks of language corpus studies were discussed at a RAS Presidium meeting in Moscow. Scientists from the Institute of Linguistics, Literature and History (ILLH) KarRC RAS Irma Mullonen and Irina Novak presented the main results of the work on the Veps and Karelian Open Corpus (VepKar), which has been underway in Karelia for 15 years. At present, its database contains 6 thousand texts of varying size.

On April 9, the RAS Presidium met to discuss the basic problems of linguistics and the tasks of language corpus studies. Researchers from the Institute of Linguistics, Literature and History (ILLH) KarRC RAS Irma Mullonen and Irina Novak made a presentation on the results of the work on the Veps and Karelian Open Corpus, VepKar. The making of the Corpus was initiated 15 years ago to preserve and systemically study the languages of Balto-Finnic peoples of Karelia. The related programing is done by specialists from the Institute of Applied Mathematical Research (IAMR) KarRC RAS.

VepKar fulfills several main tasks. In addition to research, it is preserving and accumulating written texts and samples of spoken Karelian and Veps speech. At present, it contains 6 thousand texts with 2 million word uses.

— Anyone can use VepKar as a digital library and as a full-fledged electronic dictionary. Besides, applications with a simpler interface, such as the Multimedia Dictionary of Karelian, are being developed for a wide range of users on the basis of the corpus data. Thus, the corpus is a tool for preserving the Karelian and Veps languages and provides great opportunities for their learners. One can find a word, check how it sounds, how it is correctly spelled, what grammatical characteristics it has, — explained Irma Mullonen, RAS Corresponding Fellow, Chief Researcher at Linguistics Section ILLH KarRC RAS.

The keynote lecture at the session dealt with the current stage in the development of corpus linguistics. In his talk, RAS Academician Vladimir Plungian paid special attention to the terminology and methodology in this field. The presentation included a brief overview of the history of corpus linguistics development in Russia and worldwide, and outlined the current priorities of this field of research. The speaker also informed about the high demand for the main project of the Russian corpus linguistics — Russian National Corpus.
Read more on the topics discussed at the session on RAS website.

— More than one ethnic language corpus is currently being created in Russia. However, the VepKar, which is being created at KarRC RAS, has advanced more than others both in content stuffing and in grammatical and semantical markup. It can now work as a full-fledged platform for scientific research. Apparently, that is why we were invited to make the presentation. As a take-away from the meeting, the RAS Presidium emphasized the need to support corpus studies in our country. One of the upcoming concrete support measures is the Russian Science Foundation's thematic competition for projects to create corpus resources on languages of Russia. We hope such a special RSF program will soon be announced and we will take part in it,— Irma Mullonen summarized.

See also: