Strange news yet.


SPCL Conference

2019•03•05 I presented a pilot study on language use in the Surinamese Parliament, together with Margot van den Berg (Utrecht University) at the SPCL summer meeting in Lisbon. The title of our talk was Language contact and language choice in parliamentary speech in Suriname. And here is our abstract:

In this paper, we will present preliminary results from a pilot study of language use in the Surinamese Parliament “De Nationale Assemblée” (DNA). As Suriname's main legislative body, the DNA meets frequently (multiple times per week) to conduct its business in public meetings. Public DNA meetings are audio video recorded and published on the body's Youtube channel since 2011. To date, the DNA has published over one thousand recordings of their public meetings. Meetings follow a rather rigid set of rules for conduct (DNA 1990), and although there is no mention of language in the Reglement van Orde, Dutch is the default language of communication. The Dutch is of a regional character, and despite the formality of the meetings, members regularly switch to Sranan, the country's lingua franca, during these meetings. Occasionally other Surinamese languages, such as Ndyuka for example, are used.

Parliamentary debates contain impactful information and special, formalized and often persuasive and emotional language. They are therefore considered an important resource for many disciplines in digital humanities and social sciences. Corpora have been constructed from parliamentary debates, for example, within the EU (Fiser and Lenardic 2018) and utilized in e.g. in discourse analysis and sociolinguistics (Hirst et al. 2014; Rheault et al. 2016; Bayley 2004). In order to study language use in the DNA, we rely on a newly constructed corpus of spoken language data that has been extracted from recordings of the DNA’s public meetings. Corpus construction has been automatized using an innovative combination of ELAN (Sloetjes & Wittenburg 2008) and its built in recognizers, elan2split (Cavar 2016), Python, and Google's speech recognition API. Our corpus currently consists of approximately 7 hours of recorded DNA meetings. The uncorrected transcripts yeild ca 36,000 words / 5,700 utterances from 29 participants.

We will provide a brief overview of the corpus building methodology and discuss preliminary results of our investigation into:

  • feature variation, focusing on auxiliaries, verb-preposition combinations, and the Dutch pronominal er;
  • pragmatic aspects of language use such as persuasion and negotiation;
  • language choice among Surinamese parliamentarians.

Our findings show that (a) the influence of Sranan on the Dutch morphosyntactic and discourse structure is pervasive among Parliamentarians' speech, despite the formality of the setting, and that (b) language choice is agentive – the use of Sranan, as non-default language, carries added meaning when used in the DNA context. Thus, our findings contribute to a better understanding of the impact of language contact on Surinamese parliamentary language, an understudied language style, and Surinamese society as a whole. They further showcase the utility of parliamentary resources in linguistic research.


Bayley, Paul, ed. 2004. Cross-Cultural Perspectives on Parliamentary Discourse. Discourse Approaches to Politics, Society, and Culture. Amsterdam: Benjamins.

Cavar, Damir. 2017. Elan2Split.

DNA, De Nationale Assemblée. 1990. Reglement Van Orde Voor De Nationale Assemblée (S.B. 1990 no. 43).

Fiser, Darja and Jakob Lenardic. 2018. ‘Parliamentary Corpora in the CLARIN Infrastructure’. In: Selected Papers from the CLARIN Annual Conference 2017, Budapest, 18-20 September 2017. 147. Linköping University Electronic Press, Linköping Universitet / Department of Translation, Faculty of Arts, University of Ljubljana, Slovenia, pp. 75-85.

Hirst, Graeme, Vanessa Wei Feng, Christopher Cochrane, and Nona Naderi. 2014. ‘Argumentation, Ideology, and Issue Framing in Parliamentary Discourse’. In: Proceedings of the Workshop on Frontiers and Connections between Argumentation Theory and Natural Language Processing in Forlì-Cesena, Italy, July 21-25, 2014, edited by Elena Cabrio, Serena Villata, and Adam Wyner, pp. 50-56.

Rheault, Ludovic, K. Beelen, C. Cochrane, G. Hirst G. 2016. ‘Measuring Emotion in Parliamentary Debates with Automated Textual Analysis’. In: PLOS ONE 11.12.

Sloetjes, Han and Peter Wittenburg. 2008. ‘Annotation by Category: ELAN and ISO DCR’. In: LREC 2008.

Vetenskap på kvällen: Minoritetsspråk i Sverige och världen

2019•05•11 I gave a talk about my research at the the UPPLADOC Science Evening. We even made it into the national news.

RAN test paper accepted for publication

2019•03•14 My paper Rapid Automatized Picture Naming as a Proficiency Assessment for Endangered Language Contexts: Results from Wilamowice has been accepted for publication at the Journal of Communication and Cultural Trends. Here is a link to the accpted version at the Diva Repository. The Paper should appear on the journal website soon.

Moving to Git.

2019•03•05 I'm putting the site on Github for more streamlined deploys and better tracking of version history.

Talk – Automated spoken-language corpus building: code mixing in De Nationale Assemblée

2019•01•25 I gave a talk at the Empirical Linguistics Working Group (WoGEL) at Uppsala University's Department of Linguistics and Philology, entitled Automated spoken-language corpus building: code mixing in De Nationale Assemblée. Here's the abstract:

In search of large-ish data to investigate sociolingiustic questions, we often run into a bottleneck problem: spoken language data that is interesting is often too time consuming to transcribe and annotate. At the same time, there are lots of written-language corpora available, but these often aren't suitable to address questions sociolinguists like to answer. In the talk, I will discuss steps I have taken to eliminate the bottleneck in order to study language choice / code mixing in the Surinamese Parliament — De Nationale Assemblée.

As a former colony of the Netherlands, Dutch is the de facto national language of Suriname, but the majority of the population speaks Sranan (an anglo-creole) in many everyday interactions. Despite the formality of Parliament, parliamentarians are known to code mix regularly, if not frequently. I would like to investigate the contexts and the pragmatic motivations in which this code mixing occurs, but I hate to transcribe data, so I sought to find a way to create a robust empirical dataset for my investigation without transcribing endless hours of "really interesting" legislative discussion.

To create this dataset, I used a combination of ELAN and its build-in utterance recognizers, python, and google speech recognition to transcribe recordings of the Nationale Assemblée meetings without much human intervention. In the pilot, I prepared about 10 hours of recording in an afternoon, then let the computer "do it's thing" overnight. In the morning, I woke up to a transcribed corpus of ca 36,000 words. In the talk, I will walk you through the process in some detail and discuss some open (for me anyhow) questions about the method. My general question for the group is, indeed, "Is there a paper in the method / process itself?", if so, what journal would you propose as a suitable outlet.

December 2018

Happy Holidays!!!

2018•12•24 Best holiday wishes for all, and a happy, productive, and successful new year.

Version 01.000 of the site has been launched

2018•12•19 Version 01.000 of this website has been deployed at

Beta website launch

2018•12•18 A beta version of this website has been deployed at The full version should be up soon!