Introducing the Language Technology and Data Analysis Laboratory (LADAL)

on

by Martin Schweinberger (Sydney Corpus Lab affiliate)

In this blog post I want to introduce a new resource of interest to corpus linguists in Australia and beyond: the Language Technology and Data Analysis Laboratory (LADAL). LADAL is a new support infrastructure for computational humanities established and maintained by the School of Languages and Cultures at the University of Queensland. The lab collaborates with the Sydney Corpus Lab through mutual support and sharing information and is a complementary virtual platform. LADAL provides materials relating to language data science and aims to provide materials that assist in learning how to code for humanities researchers. The Sydney Corpus Lab specifically aims to promote corpus linguistics in Australia and does not focus on humanities programming. We share a joint interest in linguistic data and text analytics.

Figure 1 LADAL website

The main goal of LADAL is to help develop computational and digital skills by providing information and practical, hands-on tutorials on data and text analytics as well as on statistical methods relevant for research in the language sciences. In order to be attractive to both beginners and people with advanced skills, the LADAL website covers a wide range of topics and introduces methods relevant for people coming with different degrees of prior knowledge and experience – ranging from introductions to concepts of quantitative reasoning to step-by-step guides on advanced statistical methods or sophisticated text mining. This includes:

  • introductions to quantitative reasoning and basic concepts in empirical language studies.
  • introductions and procedures that enable and support reproducibility in the language sciences.
  • introductions to R as programming environment for handling natural language data.
  • tutorials on data visualization and data analytics (statistics and machine learning).
  • tutorials on text analysis, text mining, distant reading, and corpus linguistics.

LADAL as well as the self-guided study materials primarily use R and Markdown – a way to combine R-code with text – with plans to expand resources to other tools and environments, including Python-based tutorials. As computation is becoming ever more prevalent across disciplines as well as in both the social and economic domains, LADAL offers a resource space for R that make it accessible to lay users as well as expert programmers.

Figure 2 Data visualisation examples

The LADAL resources are aimed at researchers in HASS (Humanities, Arts, and the Social Sciences) and we aspire to attract complete novices as well as expert users. And, while the focus of LADAL is placed on handling data that represents natural language, anyone who has an interest in quantitative methods, data visualization, statistics, or R is welcome to explore the LADAL website. As of now, the website only contains practical, hands-on tutorials. In the coming months, we hope to enhance these tutorials by adding interactive exercises, offering online and face-to-face workshops, extending the content to encompass tutorials on analysing speech rather than text, and creating screen casts that will be published on a LADAL YouTube channel. As such, we aim to make LADAL more interactive to render the tutorials more engaging.

You can find out more about the people who are engaged in developing LADAL here. If you would like to become a contributor to LADAL, please do not hesitate to get in touch with the LADAL staff via slcladal@uq.edu.au. You can also follow LADAL on Twitter via @slcladal where we announce new workshops and resources.