Komi-Permyak is an endangered Finno-Ugric language spoken in the Perm Region of European Russia. The number of speakers has decreased in the last decades; the number of written sources in the language is small; even fewer are available online. Therefore, the enrichment of existing digital resources would contribute to greater visibility of Komi-Permyak cross-linguistically, make research into the language simpler, and could also serve as a tool for members of the speaker community to have a better understanding on their language.
The aim of the project “PermCorp: A corpus of written Komi-Permyak”, which has been running from December 2022, is to increase the visibility of Komi-Permyak for both native speakers and international scholars by creating a reliable, manually annotated (morphological analysis, POS tagging and English translation) corpus of approximately 300,000 tokens representing written Komi-Permyak. The project is funded by the National Research, Development and Innovation Office under grant number FK 143242, and it is being carried out at the Department of Finno-Ugric Studies at Eötvös Loránd University (ELTE) in Budapest.
Principal Investigator, Linguist
Assistant professor at the Eötvös Loránd University, where she defended her PhD thesis on impersonal constructions in Finno-Ugric languages. She is a co-author of two typological databases (UTDb, VolgaTyp). She is a specialist in Permic and Ugric languages and her research interests are comparative syntax, linguistic typology, and contact linguistics.
Senior researcher, Linguist
Research fellow at the HUN-REN Hungarian Research Centre for Linguistics. She defended her PhD thesis on evidentiality in the Udmurt language at Eötvös Loránd University. Her research interest is grammaticalization of the evidential, aspectual, and tense categories related to the verb in the Permic languages.
Senior researcher, Computational Linguist
Freelance data analyst and linguist, a graduate of Eötvös Loránd University, where she has defended her PhD thesis on the information flow in Mansi speech. Her field of interest lately is corpus linguistics and corpus construction, during her career she has also designed and implemented text corpora for research and business purposes.
Language expert
Language expert
Research Assistant
Research Assistant
2024
2023
2022
2025
2024
2023
2022
2024
The Komi-Permyak language, a member of the Permic (also referred to as Permian) branch of the Uralic language family, is closely related to Udmurt, Komi (Zyrian), and Yazva Komi.
Its speakers primarily live in the Komi-Permyak District (Komi-Permyak Okrug), formerly the titular Komi-Permyak Autonomous District until 2005, within the Perm Region (Perm Krai). The predominant speakers’ area is situated along the upper banks of the Kama River to the west of the Ural Mountains, with additional speaker communities scattered in neighbouring regions, notably the Kirov Region. In the 2020 All-Russia population census, approximately 55,000 people identified as ethnic Komi-Permyaks, marking a decline from around 94,500 in 2010. Data from the 2020 census indicate that nearly 40,000 people consider Komi-Permyak their mother tongue, approximately 72% of the population – a notably high figure in comparison to other Uralic-speaking communities. Komi-Zyrian and Komi-Permyak are mutually intelligible to some extent; their differences primarily manifest in the phonological and lexical realms. Nevertheless, Russian serves as the common code among speakers.
Komi-Permyak has two principal dialects: northern and southern. Yazva Komi has earlier been regarded as a dialect of Komi. The Komi-Permyak literary language is based on the Kudymkar-Ińva dialect.
The language is highly endangered; the domains of language use are quite limited. Ethnologue categorizes Komi-Permyak as "Developing (5)" on the EGIDS scale, signifying vigorous language use with some standardized literature, albeit lacking widespread or sustainable implementation. Formerly enjoying official status within the Komi-Permyak Autonomous District as the titular ethnic group's native language, it lost its formal recognition in 2005. Since then, though lacking official recognition, Komi-Permyak remains prevalent in familial and educational contexts.
Komi-Permyak is typically agglutinative, but it also features syncretism both in nominal and verbal categories. The language employs rich derivational and inflectional morphology; nouns can be inflected for person, number, and case and they have an absolute and a possessive case paradigm. Komi-Permyak has 18 nominal cases including accusative, genitive, and several locational ones and it displays differential object marking. Verbs are inflected for person, number, tense, and mood but can also express additional categories such as evidentiality and aspect. There are several synthetic and analytic tense forms in the non-present tenses of the language.
Would you like to be notified when a beta version of the corpus becomes available?
Upcoming conferences