Predicting party group from the Lithuanian parliamentary speeches

Collection:
Mokslo publikacijos / Scientific publications
Document Type:
Straipsnis / Article
Language:
Anglų kalba / English
Title:
Predicting party group from the Lithuanian parliamentary speeches
In the Journal:
Informacinės technologijos ir valdymas [Information technology and control]. 2014, t. 43, Nr. 3, p. 321-332
Keywords:
LT
Žodžių daryba. Žodžio dalys / Word formation. Parts of a word.
Summary / Abstract:

LTReikšminiai žodžiai: Klasifikavimas pagal grupes; Kompiuterinė lingvistika; Mašininis mokymas su instruktoriumi; Prižiūrimas mašininis mokymasis; Tekst; Teksto klasifikavimas į partijų grupes; Computational linguistics; Supervised machine learning; Text classification into party groups.

ENA number of recent research works have used supervised machine learning approaches with a bag-of-words to classify political texts –in particular, speeches and debates– by their ideological position, expressed with a party membership. However, our classification task is more complex due to the several reasons. First, we deal with the Lithuanian language which is highly inflective, has rich morphology, vocabulary, word derivation system, and relatively free-word-order in a sentence. Besides, we have more classes, as the Lithuanian Parliament consists of more party groups if compared to e.g. the European Parliament or the US Senate. Moreover, classes are not stable, because a considerable number of the Lithuanian parliamentarians migrate from one party group to another even within the same parliamentary term. In this research we experimentally investigated the influence of different pre-processing techniques and feature types on two datasets composed of the texts taken from two parliamentary terms. A classifier based on the bag-of-words and token bigrams interpolation gave the best results: i.e. it outperformed random and majority baselines by more than 0.13 points and achieved 0.54 and 0.49 accuracy on the 1st and the 2nd dataset, respectively. The error analysis revealed that the same confusion patterns stand for both datasets, besides, majority of these confusions can be explained on the basis of the ideological or pragmatic similarities between those party groups. [From the publication]

DOI:
10.5755/j01.itc.43.3.5871
ISSN:
1392-124X; 2335-884X
Related Publications:
Permalink:
https://www.lituanistika.lt/content/85440
Updated:
2020-12-17 20:21:57
Metrics:
Views: 21    Downloads: 3
Export: