written by Martin Schweinberger
For Semester 1 of 2021, I developed a course called “SLAT7829 Corpus Linguistics and Text Analysis” in the “Language and Discourse Analysis” field of study that is part of the postgraduate Applied Linguistics program at the University of Queensland. In the program we have a majority of international students, predominantly from China and, to a lesser degree from Japan, Korea, and Saudi Arabia, with very few domestic students. The course is obligatory for students enrolled in the “Language and Discourse Analysis” field of study and an elective for students without a specific field of study or students enrolled in our TESOL field of study. It consists of a one-hour recorded online lecture (available as a playlist on the YouTube channel CorpusLingMS) and two-hour face-to-face tutorials. It has three assessments, (1) a continuous assessment consisting of multiple choice questions about the readings and lecture content, (2) a presentation where students propose a corpus-based project (the students are put into groups and provide feedback and ideas about the feasibility of the project and how to improve it), and (3) a write-up of a corpus-project that the students have conducted. The corpus-based project can have any topic as long as it is language or culture focused. The course has a very good uptake, between 35 and 50 enrolments, showing that Applied Linguistics students are interested in and aware of corpus linguistics. The course evaluations were also quite good (with SeCATs of 4.36 out of 5 the first time it was offered and 4.64 the second time).
After two iterations of teaching this course and substantive redevelopment between the two iterations I would like to share some thoughts and experiences. As a background, I am a quantitative corpus linguist myself and have taught several corpus linguistics courses at universities in English linguistics programs as well as various corpus-focused workshops in Europe. I hence thought that I could simply take the resources and course designs I had used previously and implement them at my current university. This turned out to be more difficult than I had anticipated for various reasons that I would like to elaborate on in the following.
First of all, the fact that I am now teaching in an Applied Linguistics program (rather than a typical European English Linguistics program) caused me to re-focus the course from using corpora to analyse language use and change to applications of corpus linguistics in language learning and teaching as well as analysing learner language. This was not a problem in itself but it meant that the course design had to change substantively and it meant that texts or course books such as Danielle Barth and Stefan Schnell’s Understanding Corpus Linguistics, The Fundamental Principles of Corpus Linguistics by Tony McEnery and Vaclav Brezina, Anatol Stefanowitsch’s Corpus Linguistics: A Guide to the Methodology or many other books I used in the past were not really a great fit anymore as they are more aligned with the outlook and cohort of English linguistics or general linguistics programs. I did use standard CL literature only to introduce concepts such as concordancing, co-occurrence, collocation, and annotation, while using handbook chapters to provide information of applications of CL in different domains such as analyses of lexis and grammar or CL in EFL settings.
Secondly, while I introduced students to AntConc in my previous courses, I then moved on to introduce them to R as a tool for corpus querying, analysis, data processing and visualisation. While I tried this approach in the first iteration of the course and worked mostly with pre-written notebooks, I soon realised that students felt overwhelmed as the learning curve was too steep. In addition, I received feedback questioning the use of R and notebooks in the content of classroom interactions, EFL, and language learning and teaching more generally. Based on these criticisms and my experience during the course, in the second iteration I decided to focus exclusively on user-friendly of-the-shelf point-and-click click corpus linguistics tools such as AntConc, TagAnt, CorpusMate, Voyant Tools, SkeLL, and Excel as well as Praat for analysing speech in multimodal corpora and online corpora such as the BYU corpora available via English-corpora.org. In terms of corpus analysis, I focused on a workflow where students queried offline corpora, such as components of the International Corpus of English (ICE) or standard corpora such as the BROWN, LOB, and ACE, using AntConc (or Praat when dealing with speech), and then exported the results into Excel for analysis and visualisation. We only used R-based notebooks when we introduced and performed basic text analytics methods such as topic modelling and sentiment analysis where no appropriate of-the-shelf applications are available.
The course evaluations after implementing the changes and building more heavily on of-the-shelf, point-and-click applications improved significantly, which confirms that the approach to introduce R as a tool for CL overwhelmed the students and that they profit more from learning about tools they can use directly in language teaching contexts. If I happened to offer a CL course in a European English linguistics program, I would likely revert to my old materials and definitely introduce students to R for corpus analysis which is less appropriate in the current context.
Overall, I think that the redesigned course works well in the program it is designed for and that it introduces students to CL without overwhelming them. The uptake and evaluations show that students enjoy the course and, with its focus on of-the-shelf tools and without relying on R, it is also easier to hand over the course and find tutors.