In 2023, the Sydney Corpus Lab is pleased to be featuring edited extracts from Dr Robbie Love’s CorpusCast podcast about corpus linguistics. In each blog post published throughout the year, we present the answers of leading corpus linguists to three questions. Specifically, all blog posts present answers to the following two questions:
- What are the biggest changes you’ve noticed in corpus research throughout your career?
- How will corpus linguistics make an impact on the world in the future?
Posts from episodes 1-4 additionally present answers to this question:
- What has surprised you the most about your work in corpus linguistics?
Posts from episodes 5 onwards instead present answers to this question:
- What is the biggest misconception of corpus linguistics you have encountered?
This blog post features Ute Römer. We have transcribed the relevant part of the interview but have edited answers for readability (taking out hesitation marks, discourse makers, etc). Interview answers were transcribed by Kelvin Lee from the Sydney Corpus Lab. The full interview (also featuring Clark D. Cunningham) can be found here. We are grateful to Robbie Love and Sam Cook for their assistance in creating these posts.
ROBBIE LOVE: First of all, what is the biggest change that you’ve noticed in corpus research throughout your career?
UTE RÖMER: Some key words that come to mind I think are ‘growth’ and ‘diversification’. Everything’s getting more sophisticated and bigger including the data sources we have access to. When I was working on my PhD, having 10 million words of spoken transcribed speech was amazing. 100-million-word British National Corpus – wow! Now we have billions of words that we can extract patterns from. So, that’s different. Also, increasing interdisciplinarity, methodological sophistication and diversification, growth in the areas of application as well. I did a plenary at last year’s AALA – the American Applied Linguistics Association conference. As I was preparing for that, I wanted to look at applied corpus linguistics and looked through all the papers and all the, I think, 22 strands at the conference. 20 of these strands had papers in them that mentioned corpora in the abstracts. So, it’s in so many areas in our field now and beyond our field, as this episode has shown, I think. Those are some of the things I would mention as changes.
ROBBIE LOVE: What about the biggest misconception of corpus linguistics that you’ve encountered?
UTE RÖMER: Yeah, I think we all have a funny story like something a student associated with it. But we’ll also have a more serious answer to that. I had a student once who, when I asked a group of undergraduate students what they thought corpus linguistic was, guessed that it was the study of dead languages because, I think, he thought of a corpse – a dead body. That clearly wasn’t what I was looking at. But more seriously, I think a common misconception of corpus linguistics is that it is just about using the tools and the data sets, that it’s absent of any linguistic or other theory. That I find really problematic, because just being able to retrieve a concordance from COCA or a list of collocations from any online corpus doesn’t make you a corpus linguist or corpus researcher. Relating to the legal studies interdisciplinary work, I think that’s one danger of putting corpora in the hands of legal scholars, of lawyers and judges who, within an hour workshop, figure out how to use one of these online tools. Then also, as Clark mentioned, they work with language all day long but they’re not trained linguists. Coming back to one of my teachers, John Sinclair, who warned us a corpus is not a simple object, it’s really easy to derive nonsensical conclusions from the evidence you’re looking at just as easy as it is to derive insightful ones. So, if you don’t have that background of linguistic terminology and theories and your searches are not motivated and linguistically interpretable, you’re in trouble. So, that is a major, I think, misconception – that all you need to do is learn how to use AntConc or learn how to go to COCA. I think what Clark said really illustrated that really well. With the cases study, we looked at the data together and the linguists on the team realised, “Well, cases on its own, it doesn’t mean anything; it inherits its meaning from the words it co-occurs with. It’s a general noun. It’s a shell noun.” Until we had that application of our linguistic theory knowledge, we couldn’t really make sense of the data. So, I think that’s kind of an important thing to remember.
ROBBIE LOVE: Absolutely. Now, […] I’m not going to ask the usual final quick question that I do. I’m going to make up one just for you. What on earth does corpus linguistics have to do with jazz music?
UTE RÖMER: I already hinted that it’s looking at patterns. Looking for patterns in a data set doesn’t always have to be language. Let me rephrase that. It’s language but doesn’t consist of words necessarily. The language of interest in this study was music and more specifically jazz improvisation by experts like Charlie Parker and that generation of jazz musicians. What this project aimed to uncover were central aspects of jazz improvisation. I worked with another GSU professor in the school of music, Martin Norgaard. He had a corpus of jazz solos in which there were no words but the pitches – so, the interval changes – were converted into intervals represented as positive or negative numbers of half step changes. Imagine you have a sequence. The text is a sequence of numbers preceded with minuses, sometimes, if the interval step went down. You’d have one, one, one, one, minus two, minus two, three, minus one, plus seven if somebody jumps up. With the help of a corpus analysis tool, basic analytic techniques – we used AntConc – a corpus of, I think, about 450 solos from a database called the Weimar Jazz Database, we showed that patterns are really ubiquitous in jazz, in improvisation, and we highlighted some central role of those patterns in real time music creation. Because when you improvise, you don’t plan but you fall back on memorised patterns and some jazz musicians are better at that than others. This paper just came out this year in a jazz education journal, I think, called ‘Jazz Education in Theory and Practice’. So, pattern extraction from jazz music is possible. AntConc does not care if you’re throwing it at sequences of letters or sequences of minuses and numbers – as long as there’s pattern recognition and you can make sense of them. Of course, that’s when I need the music professor because I don’t know what they mean. But he was impressed how long these patterns that were in improvisation with no sheet music in front of you, that were recurring multiple times in expert jazz musicians.