Promoting corpus linguistics and text analytics in the Arts and Social Sciences

Written by Kelvin Lee

To help promote corpus linguistics and text analytics in the Arts and Social Sciences, the Sydney Corpus Lab recently participated in the HASS RDC and IRC Computational Skills Summer School organised by the Australian Research Data Commons (February 2023) and in the University of Sydney’s Digital Humanities Day jointly hosted by the Library and Sydney Informatics Hub (March 2023).

For the Summer School, I presented a talk on behalf of the Sydney Corpus Lab in which I introduced the most important concepts and techniques in corpus linguistics (i.e., frequency lists, keywords, collocation, and concordance). In addition, Chao Sun (Sydney Informatics Hub) showcased our recently-developed quotation tool. There was a great turn-out overall with well over 30 attendees at all of our sessions on the day. Overall, close to 100 researchers, librarians, and students participated in the Summer School.

Photo showing Kelvin Lee and Chao Sun chatting at the event — *Chao Sun (left) and Kelvin Lee (right) at the Summer School. Image credit: Renee Nowytarger / ARDC*

For the Digital Humanities Day, I gave a talk where I introduced participants to free corpus linguistic tools including software (e.g., AntConc, LancsBox) and web interfaces (e.g., CQPweb). Using AntConc, I explained and demonstrated key corpus linguistic analyses (i.e., frequency lists, keywords, collocation, and concordance). PhD student and lab member Melissa Kemble also gave a well-received ‘lightning talk’ on her corpus linguistic project on female and male athletes in the Australian media.

The attendees at my talk were PhD students and research staff from different disciplines. Participants told me that they were unfamiliar with linguistic software or had very limited experience with these types of software. Because of this, many of their questions were about the capabilities of AntConc and other corpus linguistic tools (e.g., do they work with data that’s not English, can you code the data like you can in software like NVivo). Some of the attendees appreciated the explanation of concepts and terms like concordance and collocation that are becoming more common outside of corpus linguistics. The Sydney Informatics Hub also conducted workshops and talks on the day, including a combined workshop to demonstrate several recently or soon to be released tools developed for the Australian Text Analytics Platform, co-created with the Sydney Corpus Lab.

Overall, I had a great time giving these talks to engaged audiences! Based on the introductory information from my talks, I hope that attendees are now able to explore corpus linguistic analyses and software further.

Share this: