Written by Monika Bednarek and Gavin Brookes Note: This post was simultaneously published by the Sydney Corpus Lab and by the ESRC Centre for Corpus Approaches to Social Science at Lancaster University. It is published under a Creative Commons — Attribution Noncommercial license. If you want to republish it, please follow the relevant licensing guidelines….
Author: admin
Large language models (LLMs) in corpus linguistics – Using GenAI with corpora
written by Monika Bednarek The Sydney Corpus Lab recently published a post containing a synthesis of how large language models (LLMs) and generative artificial intelligence (GenAI) tools have been incorporated into corpus linguistic research. This blog post is intended as a companion to that much longer post. It presents the main take-aways that researchers may…
2025: The year in review for the Sydney Corpus Lab
written by Monika Bednarek 2025 was a slightly less active year for the Sydney Corpus Lab, as I was on long service leave during semester 1. Nevertheless, we continued working on various projects before and after my leave, including our national collaboration on the Language Data Commons of Australia (LDaCA), which is a project led…
Generative AI in corpus linguistics: A synthesis
Written by Kelvin Lee The advent of large language models (LLMs) and generative artificial intelligence (GenAI) tools such as ChatGPT has led to AI being used in many facets of everyday life as well as in education and research. In this blog post, I will explore how AI has been incorporated into (primarily English) corpus…
Copyrighted data: Options and considerations for working with newspapers and other texts
Written by Monika Bednarek Many corpus linguists analyse language in the media, especially in newspaper corpora. These corpora often contain hundreds, if not thousands, of published articles from newspapers, which are protected by copyright and typically cannot easily be shared with others outside the research team. At the same time, the sharing of data is…
Combining corpus linguistics with other methods – demonstrating the value of collaboration
Written by Monika Bednarek In recent years I have been involved in several collaborative projects that have included corpus linguistics as the methodology underpinning the research in combination with either methods or theories from other fields (or sometimes both). Most recently, I have collaborated with international researchers on two very different projects: My ongoing collaboration…
Triangulating semantic tagging and affect analysis to investigate gender-stereotypical emotion in sports news discourse
Written by Melissa Kemble I recently completed my doctoral thesis analysing the representation and evaluation of elite athletes in the Australian print media, focussing on women’s and men’s Australian Rules (AFL) and Rugby League (NRL). As part of this, I explored patterns of emotion across my newspaper corpus (the OzFooty corpus). To undertake this analysis,…
Triangulating transitivity analysis: A small-scale trial of the ATAP Semantic Tagger
Written by Helen Caple In a recently published study, I examined the processes (verbs) associated with group-based identity labels (like we, they, Australians, citizens) for self-representation in historical newspaper texts. The study corpus was small and exhaustive of one Australian newspaper, which allowed for detailed, qualitative analysis of transitivity. Transitivity ‘is concerned with a coding…
Constructions of weight loss in British and Australian newspapers
Written by Tara Coltman-Patel, Carly Bray, Paul Baker and Monika Bednarek Note: This post was simultaneously published by the Sydney Corpus Lab and by the Centre for Corpus Approaches to Social Science. It is published under a Creative Commons — Attribution Noncommercial license. If you want to republish it, please follow the relevant licensing guidelines….
2024: The year in review for the Sydney Corpus Lab
written by Monika Bednarek 2024 was yet another busy year for the Sydney Corpus Lab, as we continued working on various projects, including our collaboration on the Language Data Commons of Australia (LDaCA – ldaca.edu.au). You can find the text analytics resources that we have been developing for this project (together with the Sydney Informatics…