Exploring meanings and patterns in discourse using the UAM Corpus Tool, by Matteo Fuoli

Corpus analysis tools such as Antconc or WordSmith Tools have revolutionized the way we view and study language. By exploring frequency lists, concordances, collocations and keywords we can learn a great deal about the way language patterns across texts, registers and genres. There are, however, linguistic phenomena that do not lend themselves very easily to analysis with traditional corpus tools. One such phenomenon is evaluation. Evaluation is a broad term that encompasses all the ways in which we communicate our opinions, attitudes and feelings in discourse. The expressions in bold in the following sentences are examples of evaluative language.

Dubai’s Miracle garden is an absolute masterpiece.
The final scenes, which could have redeemed the film, are poorly handled by director Bruce Beresford.

When we try to quantify evaluation using traditional corpus tools, we run into a number of difficulties. First of all, evaluation may be expressed through an open-ended range of expressions of varying length and complexity and belonging to any word class. This means that we cannot build a complete list of evaluative forms that we can search for using corpus software. Another problem is that evaluation is highly context dependent. That is, some words carry an evaluative meaning in certain contexts but not in others. Take the adjective dedicated, for example. This word can be used to praise people, as in (3), or it can be used descriptively to refer to the purpose of something, as in (4).

3. Laura is a very dedicated person.

4. Is there a website dedicated to customers reviewing good wines (like Goodreads for books)?

The upshot of all this is that it is virtually impossible to accurately quantify evaluative language in a corpus just based on a pre-defined set of linguistic forms or by inspecting a frequency word list. Note that these challenges do not just apply to the analysis of evaluation, but to a range of discursive phenomena, including metaphor, representations of social actors, speech acts, politeness strategies, among others.

The good news is that there is a great corpus analysis program that can help us produce accurate and comprehensive quantitative analyses of evaluation and similar phenomena. It is called the UAM Corpus Tool and can be accessed, free of charge, from this website: http://www.corpustool.com/. The UAM Corpus Tool has been developed by Mick O’Donnell, from the Universidad Autónoma de Madrid. The tool has a wide range of useful features, but what sets it apart from traditional concordancers is that it enables users to annotate their texts based on a custom set of categories (Figure 1) and produces statistics based on the annotations.

This means that we can annotate instances of evaluation or whatever linguistic phenomenon we are interested in as we read the texts included in the corpus. In this way, we don’t need to try to guess in advance what words we should be looking for and we can interpret the meaning of linguistic expressions more accurately within the context in which they are used. Once we have coded our texts, we can use the tool to explore quantitative patterns across texts or groups of texts (Figure 2).

Figure 2 Exploring quantitative patterns

As part of my research visit to the Department of Linguistics at the University of Sydney and the Sydney Corpus Lab in August 2019, I had the great pleasure to give a workshop on corpus annotation and analysis using the UAM Corpus Tool. The session covered all the basic functionalities of the tool, including how to design a coding scheme, annotate texts and explore the results of the analysis.

During the workshop, I also discussed general principles of manual corpus annotation and illustrated a step-wise procedure that I have developed for this task that is designed to improve the reliability and replicability of analyses (Figure 4).

Figure 4 Stepwise procedure for annotating a corpus

I thoroughly enjoyed sharing what I have learned about the UAM Corpus Tool and manual corpus annotation with the workshop participants and the discussion at the end was incredibly stimulating. All the materials I used during the workshop, including the slides and a hands-on guide to the UAM corpus tool, are available here: https://osf.io/5hq8g/. The stepwise procedure I have mentioned above is described here.

I am very grateful to Monika Bednarek, Director of the Sydney Corpus Lab, for inviting me to visit the Department and to the University of Sydney for funding this amazing opportunity. This has been an incredibly enriching and productive experience and I look forward to continued fruitful collaboration with Monika and the Lab.

Share this: