Estimating the amount of Lithuanian text indexed by global search engines

Direct Link:
Collection:
Mokslo publikacijos / Scientific publications
Document Type:
Straipsnis / Article
Language:
Anglų kalba / English
Title:
Estimating the amount of Lithuanian text indexed by global search engines
In the Journal:
Baltic journal of modern computing [BJMC]. 2022, vol. 10, iss. 3, p. 326-336
Keywords:
LT
Lietuvių kalba / Lithuanian language; Internetas / Internet; Leksikografija / Lexicography.
Summary / Abstract:

ENThe aim of the paper is the estimate of the amount of words in Lithuanian texts indexed by the selected Global Search Engines (GSE), namely Google (by Alphabet Inc.), Bing (by Microsoft Corporation), and Yandex (by Яндех, Russia). For this purpose, a special list of 100 rare Lithuanian words (pivot words) with specific characteristics was compiled. Low frequency of pivot words is crucial to consider the count of document matches reported by GSE as an indicator of the word count. Statistical analysis has shown the following amounts of Lithuanian words as of April 2022: 56 billion words by Google, 29 billion words by Bing and 41 billion words by Yandex. Comparative results for neighbouring Belarusian (∼0.31×LT), Estonian (∼1.45×LT), Finnish (∼2.4×LT), Latvian (∼0.95×LT), Polish (∼11×LT), and Russian (∼49×LT) languages have also been assessed. Keywords: global search engines, Google, Bing, Yandex, Lithuanian language, webometrics, corpus, pivot words. [From the publication]

DOI:
10.22364/bjmc.2022.10.3.06
ISSN:
2255-8950; 2255-8942
Related Publications:
Dabartinės lietuvių kalbos tekstynas / Vytauto Didžiojo universitetas. Kompiuterinės lingvistikos centras. Kaunas : VDU Kompiuterinės lingvistikos centras, 1998-2016. 1 elektroninis išteklius (online).
Permalink:
https://www.lituanistika.lt/content/105548
Updated:
2023-11-24 16:06:26
Metrics:
Views: 17    Downloads: 3
Export: