written by Rodrigo Arellano
One research practice that has puzzled me over the years is the combination of software packages and the complementary analyses they afford to develop a better understanding of the phenomenon under investigation. Corpus linguists have long understood that numerical data can provide a general picture, which can be complemented by in-depth analyses. Thus, the combination of corpus linguistics with discourse analysis is now quite common, including in Critical Discourse Analysis. However, such mixed-methods research typically relies on corpus linguistic software rather than integrating it with qualitative analysis software programs that are also available and commonly used in disciplines outside linguistics.
This is why my PhD thesis (Arellano, 2022) combined the use of the well-known corpus linguistic software program AntConc (Anthony, 2022) with NVivo (QSR International, 2020), which relies on a more qualitative/interpretative tradition. The aim of this study was to identify ideologies from policy and interview data in relation to the learning and teaching of linguistics in Chilean second language teacher education programs (TESOL). My data consisted of EFL training standards, training programs descriptions, and course outlines in an EFL teacher training program as well as the voices from lecturers and teacher trainees.
In using corpus linguistic frequency analysis with these data, I followed the procedure suggested by Díaz-Maggioli (2015). In this approach, he selected the 20 most frequent linguistic items in the given data set as the starting point to thematise data qualitatively (these could be 30 or 50, depending on the data characteristics and the researcher’s objectives). In the second, qualitative stage, and from these 20 identified words, only the repeated linguistic items across datasets were included for thematic exmination. For instance, in Table 1, the lexical items education and learning are in bold font as they are repeated across subsets, to be analysed qualitatively at a later stage (Rf = raw frequency).

Importantly, I used NVivo for this subsequent qualitative analysis. NVivo is not commonly used in corpus linguistics, but its word query option is useful if the corpus results need to be thematised in the following qualitative stage. In this way, we can follow Selvi’s suggestion of undertaking research by “analyzing with counting and comparisons, mostly through keywords or content, followed by the interpretation of the underlying context” (2020, p. 443). Hence, this project employed a sequential explanatory strategy, that is, a quantitative stage followed by a qualitative one. The first phase is the traditional quantification of lexical choices using AntConc, while the second phase is the in-depth exploration of these results to create categories through qualitative analysis with NVivo.
Notably, the complementarity offered by NVivo and AntConc reflects the need for “moving from a purely statistical approach to corpus linguistics to a more blended intuitive-empirical approach” (Graham, 2014, p. 1). In this approach, the results provided by the software can be complemented by the researcher’s expertise, especially considering that the use of content analysis is still an unresolved issue in the corpus linguistics literature (Rayson, 2015). Without a doubt, this poses a challenge as many researchers follow a single research paradigm, and are familiar with the software programs used in these paradigms. Qualitative researchers may not feel comfortable employing methodologies outside their expertise and corpus linguists might feel that thematic analysis lacks the statistical rigour corpus analytical software provides. Nevertheless, combining the two is an opportunity not only in terms of professional development, but also for working with other professionals whose analytical skills can complement ours, particularly in interdisciplinary projects.
Indeed, there is a growing body of literature that supports the idea of mixed-methods research to explore context, something that corpus studies may not emphasise due to massive dataset sizes (Pérez-Paredes & Curry, 2024). All in all, these are decisions the corpus linguist must take and represent the challenges but also the uniqueness and opportunities of each research project.
References
Anthony, L. (2022). AntConc (Version 4.1.4) [Computer software]. Waseda University.
Arellano, R. (2022). A discursive study of ideologies in an EFL teacher training program in Chile: The case of applied linguistics instruction. [Doctoral dissertation], The University of New South Wales.
Díaz-Magiolli, G. (2017). Ideologies and Discourses in the Standards for Language Teachers in South America: A Corpus-based Analysis. In L. D. Kamhi-Stein, G. Díaz- Maggioli & L. C. de Oliveira (Eds.). English Language Teaching in South America: Policy, Preparation, and Practices (pp. 31-53). Multilingual Matters.
Graham, D. (2014). How I learned to stop empiricising and love my Intuitions. In Proceedings of the international conference: DRAL2/ILA (pp. 1-10). King Mongkut’s University of Technology Thonburi.
Pérez-Paredes, P. & Curry, N. (2024). Epistemologies of corpus linguistics across disciplines. Research Methods in Applied Linguistics, 3(3), 100141.
QSR International Pty Ltd. (2020). NVivo (Version 12) [Computer software].
Rayson, P. (2015). Tools and methods for corpus compilation and analysis. In D. Biber & R. Reppen (Eds.), The Cambridge handbook of English corpus linguistics (pp. 32-49). Cambridge University Press.
Selvi, A. F. (2020). Qualitative content analysis. In J. McKinley & H. Rose (Eds.), The Routledge Handbook of Research Methods in Applied Linguistics (pp. 440-452). Routledge.