In 2023, the Sydney Corpus Lab is pleased to be featuring edited extracts from Dr Robbie Love’s CorpusCast podcast about corpus linguistics. In each blog post published throughout the year, we present the answers of leading corpus linguists to three questions. Specifically, all blog posts present answers to the following two questions:
- What are the biggest changes you’ve noticed in corpus research throughout your career?
- How will corpus linguistics make an impact on the world in the future?
Posts from episodes 1-4 additionally present answers to this question:
- What has surprised you the most about your work in corpus linguistics?
Posts from episodes 5 onwards instead present answers to this question:
- What is the biggest misconception of corpus linguistics you have encountered?
This blog post features Mark McGlashan. We have transcribed the relevant part of the interview but have edited answers for readability (taking out hesitation marks, discourse makers, etc). Interview answers were transcribed by Kelvin Lee from the Sydney Corpus Lab. The full interview can be found here. We are grateful to Robbie Love and Sam Cook for their assistance in creating these posts.
ROBBIE LOVE: What are the biggest changes you’ve noticed in corpus research since you began working in this area?
MARK MCGLASHAN: How long have we got? Immediately in response to that – I went to an inaugural plenary for Jack Grieve at Birmingham University when he first became a prof. One of the things that I really took from that was, and this was as I was starting to do a bit more in this area, he said that linguistics is becoming a data science or becoming more like a data science. So, the data science approaches, coding is kind of a much more fundamental skill. When I did my undergraduate and masters, there was no provision for coding and half my job now is probably programming in Python and R to do things like concordances. Well, AntConc is fabulous, and I did my PhD using AntConc. But it is restricted to AntConc and what AntConc can do. So, data science – we’re not just looking at words in a lot of ways now. We’re not just looking at text. We’re not just looking at a corpus as a big lump. We’re also looking at a corpus as lots of texts with lots of authors with lots of variables with lots of things that we can look up with or triangulating those variables. So, looking at the data and language use as a much more complicated thing. So, it’s not a quick answer. For example, two ways we applied this is when we were looking at rape threats – it was looking at language and how language correlated social networks. How do we plot, for example, concordances and frequency lists and collocations onto things like a social network? When we have texts produced by users that contain certain linguistic things, how do we map those in a network? We’ve done that using twitter data. But also, using, well, twitter data again to look at vaccine misinformation. I would be remiss to not mention another project that I had which was with Professor Andrew Kehoe, Associate Professor Robert Lawson, and – I think she’s a Senior Lecturer – Tatiana Grieshofer at BCU, Matt Gee, and also, Dr Selena Schmidt. So, it’s pretty big team – traccovid.com, if you’re interested. We used the data derived from that to look at social networks. So, biggest change: corpus and data science.
ROBBIE LOVE: Very good. Question two: what is the biggest misconception of corpus linguistics that you’ve encountered?
MARK MCGLASHAN: Word lists and dictionaries. Next question.
ROBBIE LOVE: Brilliant. And finally […] – how will corpus linguistics make an impact or, I suppose, continue to make an impact on the world in the future?
MARK MCGLASHAN: I think that as it continues to proliferate, it’s going to have academic impacts for a long time. So again, this has been talked about quite at length in corpus linguistics, having this random problem. It’s not quite NLP. It’s not quite topic modelling – well, it’s definitely not topic modelling. It’s not computational linguistics… or is it? So, we operate in this quite small little very friendly, very collegiate, and also quite dynamic bubble. So, what do we do? The kind of stuff that I’m interested in is emancipation. How do we look at things like discrimination? How do we look at those social problems by using corpus linguistics as a method? I feel like that’s kind of in and of itself this proliferation of the method. It is a method that allows us to do really interesting things that get us to an end point. The tools, the methods, and the approaches sometimes aren’t the end point in and of themselves. There is a place for that and it’s really important that we interrogate, critique, and question the science and the approaches. If you look at anything to do with statistics in the last 20 years in corpus linguistics, that’s going on, that’s happening. Is it log dice? Is it log ratio? Is it log-log? What is it? Impact on the world… I think even looking at the KTP where we’re going to apply corpus linguistics. That real world application of not just throwing an NLP tool chain at a problem because that’s where we get into problems – where you don’t detect effectively safeguarding issues. It’s understanding that there is a place between the snazzy techy stuff and the linguistic – the grammatical, the functional grammatical, the contextual, the critical approaches to linguistics. So, yeah, proliferation.