Sydney Corpus Lab

Decisions, decisions… Combining corpus linguistic software with qualitative analysis through NVivo

written by Rodrigo Arellano One research practice that has puzzled me over the years is the combination of software packages and the complementary analyses they afford to develop a better understanding of the phenomenon under investigation. Corpus linguists have long understood that numerical data can provide a general picture, which can be complemented by in-depth…

Human rights in British parliamentary debates: a computational approach

Written by Marco Duranti Introduction: The symbolic force of human rights The last half century has witnessed the ascendancy of human rights as a language of legal, moral, and political claim-making across much of the globe. The idiom of human rights has gained particular symbolic force (Neves 2007) and is regarded by some as ‘the…

Representations of obesity in the news

Written by Monika Bednarek and Gavin Brookes Note: This post was simultaneously published by the Sydney Corpus Lab and by the ESRC Centre for Corpus Approaches to Social Science at Lancaster University. It is published under a Creative Commons — Attribution Noncommercial license. If you want to republish it, please follow the relevant licensing guidelines….

Large language models (LLMs) in corpus linguistics – Using GenAI with corpora

written by Monika Bednarek The Sydney Corpus Lab recently published a post containing a synthesis of how large language models (LLMs) and generative artificial intelligence (GenAI) tools have been incorporated into corpus linguistic research. This blog post is intended as a companion to that much longer post. It presents the main take-aways that researchers may…

2025: The year in review for the Sydney Corpus Lab

written by Monika Bednarek 2025 was a slightly less active year for the Sydney Corpus Lab, as I was on long service leave during semester 1. Nevertheless, we continued working on various projects before and after my leave, including our national collaboration on the Language Data Commons of Australia (LDaCA), which is a project led…

Generative AI in corpus linguistics: A synthesis

Written by Kelvin Lee The advent of large language models (LLMs) and generative artificial intelligence (GenAI) tools such as ChatGPT has led to AI being used in many facets of everyday life as well as in education and research. In this blog post, I will explore how AI has been incorporated into (primarily English) corpus…

Copyrighted data: Options and considerations for working with newspapers and other texts

Written by Monika Bednarek Many corpus linguists analyse language in the media, especially in newspaper corpora. These corpora often contain hundreds, if not thousands, of published articles from newspapers, which are protected by copyright and typically cannot easily be shared with others outside the research team. At the same time, the sharing of data is…

Combining corpus linguistics with other methods – demonstrating the value of collaboration

Written by Monika Bednarek In recent years I have been involved in several collaborative projects that have included corpus linguistics as the methodology underpinning the research in combination with either methods or theories from other fields (or sometimes both). Most recently, I have collaborated with international researchers on two very different projects: My ongoing collaboration…

Triangulating semantic tagging and affect analysis to investigate gender-stereotypical emotion in sports news discourse

Written by Melissa Kemble I recently completed my doctoral thesis analysing the representation and evaluation of elite athletes in the Australian print media, focussing on women’s and men’s Australian Rules (AFL) and Rugby League (NRL). As part of this, I explored patterns of emotion across my newspaper corpus (the OzFooty corpus). To undertake this analysis,…

Triangulating transitivity analysis: A small-scale trial of the ATAP Semantic Tagger

Written by Helen Caple In a recently published study, I examined the processes (verbs) associated with group-based identity labels (like we, they, Australians, citizens) for self-representation in historical newspaper texts. The study corpus was small and exhaustive of one Australian newspaper, which allowed for detailed, qualitative analysis of transitivity. Transitivity ‘is concerned with a coding…

Discover the Power of Computer-based Text Analysis