How lemmatisation and derivational annotation affect productivity measures: the case of deverbal agent nouns in the Joint Corpus of Lithuanian

Direct Link:
Collection:
Mokslo publikacijos / Scientific publications
Document Type:
Straipsnis / Article
Language:
Anglų kalba / English
Title:
How lemmatisation and derivational annotation affect productivity measures: the case of deverbal agent nouns in the Joint Corpus of Lithuanian
Alternative Title:
Kā lematizēšana un derivatīvā anotēšana ietekmē produktivitātes vērtēšanu: darītājvārdi Vienotajā lietuviešu valodas korpusā
In the Journal:
Valoda: nozīme un forma Language: meaning and form, 2024, 15, 138-151
Summary / Abstract:

ENWe discuss the automatic and manual stages of the lemmatisation and annotation of the Joint Corpus of Lithuanian (1.3 billion words) used to measure derivational productivity. As a case study, we present data of three productive deverbal agent noun suffixes in Lithuanian, -toj-, -ėj-, -ik-, and measure their realized, expanding, and potential productivity. We show that an additional semi-automatic lemmatisation and a manual derivational annotation significantly increase type and hapax counts. We also note that lemmatisation is affected by an artificially increased number of lemmas due to homographic forms unresolved by the lemmatiser. After the manual disambiguation of hapaxes, the numbers of feminine formations in -toj-(a) and -ėj-(a) were the most significantly reduced. Keywords: word formation, derivational productivity, agent nouns, Lithuanian.

DOI:
10.22364/vnf.15.09
ISSN:
2256-0602; 2255-9256
Subject:
Permalink:
https://www.lituanistika.lt/content/90215
Updated:
2026-03-04 14:22:17
Metrics:
Views: 3
Export: