Experimental Polish-Lithuanian corpus with the semantic annotation elements

Collection:
Mokslo publikacijos / Scientific publications
Document Type:
Straipsnis / Article
Language:
Anglų kalba / English
Title:
Experimental Polish-Lithuanian corpus with the semantic annotation elements
In the Journal:
Cognitive Studies [Études cognitives]. 2013, Vol. 13, p. 97-111
Keywords:
LT
Lenkų kalba / Polish language; Semantika / Semantics.
Summary / Abstract:

LTReikšminiai žodžiai: Annotation, Polish, Lithuanian; Corpora; Lenkų kalba; Lygiagretusis tekstynas; Lyginamasis tekstynas; Parallel and comparable corpora; Semantinis anotavimas; Tekstynas; Anotacija; Comparable corpora; Corpora; Lenkų kalba; Lithuanian; Lygiagretusis ir lyginamasis tekstynas; Parallel corpora; Polish; Semantic annotation; Tekstynas.

ENThe experimental Polish-Lithuanian corpus is the first extended bilingual PolishLithuanian corpus whose resources have been divided into two subcorpora: parallel and comparable. The parallel subcorpus (A) is widely applied in contrastive studies carried out at the Institute of Slavic Studies of the Polish Academy of Sciences by the Corpus Linguistics and Semantics Team. Moreover, on the basis of the parallel subcorpus (A) a Polish-Lithuanian electronic dictionary and a Polish-Lithuanian terminological dictionary are coming into existence. The recipients of the parallel subcorpus (A) available online in the near future are supposed to be not only linguists, but also IT specialists, literary scholars, librarians, teachers, translators, specialists for linguistic information machine processing, programmers participating in creating automatic translation systems. Also, irrespectively of the education and the job being done, Poles studying Lithuanian (e.g. students) and Lithuanians studying Polish. The semantic annotation planned for the parallel corpus (A) is bringing a new value into corpus linguistics. It reflects the content plan in isolation from the formal side of both the languages. The semantic annotation is considered to have a big influence on the development of the machine translation. The resources of the comparable subcorpus (B) are definitely more modest in comparison with the parallel subcorpus (A).However, the materials stored in the comparable subcorpus (B) reflect mutual Polish-Lithuanian relations, a little bit differing views about the world, history, nature etc demonstrated by Poles and Lithuanians. Therefore, making the subcorpus B available online is supposed to be of interest to wide circles of recipients, such as historians, ethnographers, folklorists, political scientists, sociologists, anthropologists, culturologists, researchers of the linguistic image of the world. The long shared history of Lithuania and Poland, the common border, the issues of the Polish minority in Lithuania and those of the Lithuanians living in Poland, also the issues of Polish schools in Lithuania and those of Lithuanian schools in Poland are among some problems to look at from Polish and Lithuanian perspective. This fact can result in people who shape up the foreign policy of Poland and the national minorities internal policy getting interested in the subcorpus B resources. There is no doubt that the Polish-Lithuanian comparable corpus (B) can be a valuable source of reliable information for linguists, history teachers, translators, students of different branches of humanities and social sciences and those searching the knowledge about the world, art etc. [From the publication]

DOI:
10.11649/cs.2013.006
ISSN:
2392-2397; 2080-7147
Related Publications:
Permalink:
https://www.lituanistika.lt/content/57268
Updated:
2018-04-18 14:12:31
Metrics:
Views: 13
Export: