This project is supported by NKFIH under grant nr. FK 143242.

Welcome to PermCorp

A gold standard corpus of written Komi-Permyak

Image
Бур лун!

About Us

The PermCorp project

Komi-Permyak is an endangered Finno-Ugric language spoken in the Perm Region of European Russia. The number of speakers has decreased in the last decades; the number of written sources in the language is small; even fewer are available online. Therefore, the enrichment of existing digital resources would contribute to greater visibility of Komi-Permyak cross-linguistically, make research into the language simpler, and could also serve as a tool for members of the speaker community to have a better understanding on their language.

The aim of the project “PermCorp: A corpus of written Komi-Permyak”, which has been running from December 2022, is to increase the visibility of Komi-Permyak for both native speakers and international scholars by creating a reliable, manually annotated (morphological analysis, POS tagging and English translation) corpus of approximately 300,000 tokens representing written Komi-Permyak. The project is funded by the National Research, Development and Innovation Office under grant number FK 143242, and it is being carried out at the Department of Finno-Ugric Studies at Eötvös Loránd University (ELTE) in Budapest.

Learn more about the Komi-Permyak language

Research Group

Participants in the research project

Image

Nikolett F. Gulyás, Ph.D.

Principal Investigator, Linguist

Assistant professor at the Eötvös Loránd University, where she defended her PhD thesis on impersonal constructions in Finno-Ugric languages. She is a co-author of two typological databases (UTDb, VolgaTyp). She is a specialist in Permic and Ugric languages and her research interests are comparative syntax, linguistic typology, and contact linguistics.

Image

Ditta Szabó, Ph.D.

Senior researcher, Linguist

Research fellow at the HUN-REN Hungarian Research Centre for Linguistics. She defended her PhD thesis on evidentiality in the Udmurt language at Eötvös Loránd University. Her research interest is grammaticalization of the evidential, aspectual, and tense categories related to the verb in the Permic languages.

Image

Szilvia Németh, Ph.D.

Senior researcher, Computational Linguist

Freelance data analyst and linguist, a graduate of Eötvös Loránd University, where she has defended her PhD thesis on the information flow in Mansi speech. Her field of interest lately is corpus linguistics and corpus construction, during her career she has also designed and implemented text corpora for research and business purposes.

Image

Larisa Ponomareva, Ph.D.

Language expert

Image

Vasilii Epanov, MA

Language expert

Image

Levente Máthé, MA

Research Assistant

Image

Eszter Napsugár Tóbiás, BA

Research Assistant

Research

Talks & Publications

Talks

2024

2023

2022

Publications

2025

  • [submitted] Szabó, Ditta. Diakrón összefüggések az udmurt szintetikus és analitikus múlt idők között.
  • [submitted] F. Gulyás, Nikolett. Emerging passives in Permic languages.
  • [accepted] F. Gulyás, Nikolett. Adnominális birtoklás a komi-permjákban. [Adnominal possession in Komi-Permyak] To appear in Nyelvtudományi Közlemények.

2024

  • Szabó, Ditta – F. Gulyás, Nikolett – Németh, Szilvia 2024. Egy komi-permják korpusz létrehozásának kihívásai: Igék és melléknevek. [The challenges of building a Komi-Permyak corpus: verbs and adjectives]. Nyelvtudományi Közlemények. 120: 21-48. DOI: 10.15776/NyK.2024.120.2
  • Szeverényi, Sándor – Rebeka, Kubitsch – Katalin, Sipőcz – Szabó, Ditta – Timár, Bogáta – F. Gulyás, Nikolett 2024. Evidentiality in Uralic Languages: Exercise book with selected bibliography. (Working Papers in Corpus Linguistics and Digital Technologies: Analyses and Methodology Vol. 9.) Szeged – Hamburg. DOI: 10.14232/wpcl.2024.9

2023

  • Németh, Szilvia – Szabó, Ditta – F. Gulyás, Nikolett 2023. PermCorp: Egy komi-permják korpusz létrehozása. [PermCorp: Towards the implementation of a Komi-Permyak corpus] Folia Uralica Debreceniensia 30: 181-202. DOI: 10.52401/fud/2023/11
  • Brdar, Mario – Brdar-Szabó, Rita ‒ F. Gulyás, Nikolett – Horváth, Laura 2023. Hypocoristic reduplications and embellished clippings in Hungarian (and elsewhere). In Jeffrey Williams (ed.): Expressivity in the European Linguistic Sphere. Cambridge: Cambridge University Press, 13–53. DOI: 10.1017/9781108989084.003

2022

  • Szabó, Ditta 2022. A komi-permják evidencialitás vizsgálata elicitált adatokon keresztül. [Examining Komi-Permyak evidentiality through elicited data] Nyelvtudományi Közlemények 118: 137–162. DOI: 10.15776/NyK.2022.118.4

Other

2024

The Language

About Komi-Permyak

The Komi-Permyak language, a member of the Permic (also referred to as Permian) branch of the Uralic language family, is closely related to Udmurt, Komi (Zyrian), and Yazva Komi.

Its speakers primarily live in the Komi-Permyak District (Komi-Permyak Okrug), formerly the titular Komi-Permyak Autonomous District until 2005, within the Perm Region (Perm Krai). The predominant speakers’ area is situated along the upper banks of the Kama River to the west of the Ural Mountains, with additional speaker communities scattered in neighbouring regions, notably the Kirov Region. In the 2020 All-Russia population census, approximately 55,000 people identified as ethnic Komi-Permyaks, marking a decline from around 94,500 in 2010. Data from the 2020 census indicate that nearly 40,000 people consider Komi-Permyak their mother tongue, approximately 72% of the population – a notably high figure in comparison to other Uralic-speaking communities. Komi-Zyrian and Komi-Permyak are mutually intelligible to some extent; their differences primarily manifest in the phonological and lexical realms. Nevertheless, Russian serves as the common code among speakers.

Komi-Permyak has two principal dialects: northern and southern. Yazva Komi has earlier been regarded as a dialect of Komi. The Komi-Permyak literary language is based on the Kudymkar-Ińva dialect.

The language is highly endangered; the domains of language use are quite limited. Ethnologue categorizes Komi-Permyak as "Developing (5)" on the EGIDS scale, signifying vigorous language use with some standardized literature, albeit lacking widespread or sustainable implementation. Formerly enjoying official status within the Komi-Permyak Autonomous District as the titular ethnic group's native language, it lost its formal recognition in 2005. Since then, though lacking official recognition, Komi-Permyak remains prevalent in familial and educational contexts.

Komi-Permyak is typically agglutinative, but it also features syncretism both in nominal and verbal categories. The language employs rich derivational and inflectional morphology; nouns can be inflected for person, number, and case and they have an absolute and a possessive case paradigm. Komi-Permyak has 18 nominal cases including accusative, genitive, and several locational ones and it displays differential object marking. Verbs are inflected for person, number, tense, and mood but can also express additional categories such as evidentiality and aspect. There are several synthetic and analytic tense forms in the non-present tenses of the language.

Meet Us

Contact

Do you have any questions or comments?

Would you like to be notified when a beta version of the corpus becomes available?

Upcoming conferences

Ми перем коми кыв


We the Komi-Permyak language