Predicting party group from the Lithuanian parliamentary speeches

Kapočiūtė-Dzikienė, Jurgita; Krupavičius, Algis

doi:https://doi.org/10.5755/j01.itc.43.3.5871

Predicting party group from the Lithuanian parliamentary speeches

Link to:

straipsnio tekstas

Collection:

Mokslo publikacijos / Scientific publications

Document Type:

Žurnalų straipsniai / Journal articles

Language:

Anglų kalba / English

Title:

Predicting party group from the Lithuanian parliamentary speeches

Authors:

In the Journal:

Informacinės technologijos ir valdymas Information technology and control, 2014, 43, 3, 321-332

Subject terms:

Žodžių daryba. Žodžio dalys / Word formation. Parts of a word.

Summary / Abstract:

ENA number of recent research works have used supervised machine learning approaches with a bag-of-words to classify political texts –in particular, speeches and debates– by their ideological position, expressed with a party membership. However, our classification task is more complex due to the several reasons. First, we deal with the Lithuanian language which is highly inflective, has rich morphology, vocabulary, word derivation system, and relatively free-word-order in a sentence. Besides, we have more classes, as the Lithuanian Parliament consists of more party groups if compared to e.g. the European Parliament or the US Senate. Moreover, classes are not stable, because a considerable number of the Lithuanian parliamentarians migrate from one party group to another even within the same parliamentary term. In this research we experimentally investigated the influence of different pre-processing techniques and feature types on two datasets composed of the texts taken from two parliamentary terms. A classifier based on the bag-of-words and token bigrams interpolation gave the best results: i.e. it outperformed random and majority baselines by more than 0.13 points and achieved 0.54 and 0.49 accuracy on the 1st and the 2nd dataset, respectively. The error analysis revealed that the same confusion patterns stand for both datasets, besides, majority of these confusions can be explained on the basis of the ideological or pragmatic similarities between those party groups.

DOI:

10.5755/j01.itc.43.3.5871

ISSN:

1392-124X; 2335-884X

Subject area:

Kalbotyra / Linguistics

Related Publications:

A Comparison of approaches for sentiment classification on Lithuanian internet comments. Proceedings of the 4th biennial international workshop on Balto-Slavic natural language processing. Stroudsburg (PA): Association for Computational Linguistics, 2013. P. 2-11.
Improving topic classification for highly inflective languages. Proceedings of COLING 2012: technical papers. Bombay: Indian Institute of Technology, 2012. P. 1393-1410.
"Lemuoklis" - morfologinei analizei. Darbai ir dienos 2000, 24, 245-274.
Lietuvių kalbos žodynas (t. I-XX, 1941-2002): elektroninis variantas.. Vilnius : Lietuvių kalbos institutas, 2005 (atnaujinta versija, 2018). 1 elektroninis išteklius (online).
Lietuvių politinių partijų ir ideologinių srovių (iki 1940 m.) istoriografija. Istorija 2011, 84, 49-65.
Lietuvos Respublikos Seimo narių kalbinė raiška atsižvelgiant į jų politinę orientaciją. Darbai ir dienos 2015, 64, 133-151.

Permalink:

https://www.lituanistika.lt/content/85440

Updated:

2026-02-25 13:39:57

Metrics:

Views: 96 Downloads: 3

Export:

Choose type:

Download

User ID:
User Password: