Combining corpus linguistics with other methods – demonstrating the value of collaboration

on

Written by Monika Bednarek

In recent years I have been involved in several collaborative projects that have included corpus linguistics as the methodology underpinning the research in combination with either methods or theories from other fields (or sometimes both). Most recently, I have collaborated with international researchers on two very different projects:

My ongoing collaboration with Prof Barbra Meek (University of Michigan) analyses television dialogue in Australia and the United States from a linguistic perspective. We compiled cross-cultural sets of specialised television corpora from the two countries and are using classic corpus methods to identify empirical patterns and frequency trends in these datasets. We combine this with qualitative discourse analysis which draws on theories from linguistic anthropology and sociocultural linguistics (for example, rhematisation, erasure). Our combined quantitative-qualitative methodological approach underscores the usefulness of mixed methods for a more capacious view of the sociolinguistics of media. It allows the identification of patterns which we can then contextualize to reveal social and cultural nuances. It has also resulted in new theoretical concepts that can be applied to other datasets (such as our proposed semiotic processes of semiotic overlay, erasure marking, icon marking). Our most recent publication was published in the Journal of Linguistic Anthropology 35/1 (open access here), showing the potential openness of this field to corpus linguistics. We continue to examine our corpora quantitatively and qualitatively, as they are rich sources of interesting language data, and we continue to explore how corpus linguistics and linguistic anthropology can mutually enrich each other.

Another collaboration focusses on combining corpus linguistics and natural language processing (NLP) techniques, in particular sentiment analysis, to study large datasets. This collaboration – with Prof Maite Taboada (Simon Fraser University) – uses corpus linguistics both to enhance and to interrogate the analysis of sentiment in corpora. This is a very new collaboration and our first joint study was published in Corpus Pragmatics 9 in 2025 (open access here). This study examines a large Canadian English-language news corpus with respect to quotation and positive/negative sentiment. Specifically, we analyse sentiment in reported/quoted speech in comparison to non-quoted speech, testing the hypothesis that quoted speech contains negative sentiment and is more subjective. Crucially, our study explores whether NLP tools that simplify pragmatically complex concepts (such as attitude/evaluation/stance) can be used to test hypotheses that derive from qualitative linguistic analyses of news discourse.

While these two collaborative projects differ significantly in their use of methods and theories, what they have in common is that they clearly show the value of collaboration in linguistics and adjacent fields.

References

Bednarek, M. & B. Meek (2025) ‘Are you Navajo or Inuit?’: Identity, television dialogue, and Indigenizing semiotics. Journal of Linguistic Anthropology 35/1: e12449. https://doi.org/10.1111/jola.12449 (open access)

Bednarek, M. & M. Taboada (2025) Attitude in reported and non-reported news: A critique of sentiment analysis in corpus pragmatics. Corpus Pragmatics 9: 111-133. https://doi.org/10.1007/s41701-025-00185-6 (open access)