Triangulating computational and corpus linguistic methods to investigate climate discourse on Twitter

Written by Darcy McCarthy

The first few years of the 2020s have accelerated climate change issues to the core of Australian discourse. Since the 2019-20 bushfires, climate change and its impacts have become fixtures in the nation’s headspace. However, linguistic research on the topic has not matched the pace of climate change’s growing importance to Australians, despite the growth of ‘ecolinguistics’ worldwide (see Poole 2022).

To address this, I recently completed an Honours thesis investigating the link between climate discourses on twitter and political ideologies, in particular right- and left-leaning Australian users/accounts. I was provided with a dataset consisting of tweets by Australian twitter users mentioning the word ‘climate’ over a period of just over two years. The tweets are separated based on the political leanings of the user/account (i.e. into left-leaning and right-leaning groups). In order to understand how these two groups use language in tweets that mention climate, I triangulated three different methods: sentiment analysis (in which short texts are assigned a numerical score for their sentiment on a scale from -1 to 1; see Hutto & Gilbert 2014 for the method used in this thesis), multiple correspondence analysis (in which tagged texts are subject to dimension reduction to reveal axes of variation; see Clarke & Grieve 2019 for a similar example), and keyword analysis (in which word frequencies are compared to determine which words are characteristic of a given corpus).

The sentiment analysis revealed that left-leaning users speak on average more positively than right-leaning users, as shown in Figure 1 below. This difference, however, was small. Additionally, this sentiment fluctuates over the period of the dataset, and appears to respond to external events (for example, the dip on the left of the graph corresponds to the 2019-20 bushfire season).

A graph showing the sentiment polarity of left- and right-leaning accounts with peaks and troughs over time
Figure 1: Polarity of left-leaning & right-leaning accounts over time

The multiple correspondence analysis revealed three main dimensions of variation within the dataset. The first relates to word length, which is expected since longer tweets contain more language features. The remaining two dimensions correspond to two clear grammatical styles: an interactive style and a persuasive style. The multiple correspondence analysis thus revealed two important findings. First, at least grammatically, the groups use language similarly when it comes to climate issues, since the multiple correspondence analysis does not differentiate them. Second, neither group (i.e. left-leaning and right-leaning) are a monolith in terms of grammatical style, each group containing tweets that are ‘interactive’ and tweets that are ‘persuasive’ in style.

The keyword analysis revealed the largest differences between the groups. Most keywords within the right-leaning group were either related to adversarial framing (how a given party frames their adversaries in discourse, e.g. ‘mockeries’, ‘hystericals’, ‘climatefraud), or scientific legitimation (how groups use various strategies to make their scientific position appear legitimate, e.g. ‘cagw’, ‘carbondioxide’, ‘minoan’). This finding lines up with what we understand about the nature of the climate-movement. Since climate change deniers are opposed to the scientific consensus, they have to expend more effort delegitimizing their opponents. Climate change activists, on the other hand, not having this burden, are free to spend more time talking about other issues. This is reflected in my keyword analysis, with most keywords in the left-leaning group being related to other political and environmental issues (e.g. ‘indue’, ‘icac’, ‘robodebt’).

Overall, my analysis revealed a few main points. Firstly, data collected on a left-leaning/right-leaning basis appears to be an apt proxy for studying the language of climate change advocates/activists and climate change deniers/skeptics. All the findings in this study lined up with the existing literature on the language of such groups, for example the fact that climate change deniers are more prone to adversarial behaviour (see Medimorec & Pennycook 2015), or that scientific de-legitimation is a key component of climate denialist speech (Peters 2008), despite this not being explicit in the original dataset. Secondly, the language of these two groups differs greatly on a lexical/keyword basis, but not much on other fronts (i.e. sentiment and grammatical style).

This blog post derives from my Honours thesis completed at the University of Sydney, which can be accessed here.

Acknowledgments

I am grateful to Dr. Tristram Alexander for providing the dataset and to Dr. Isobelle Clarke for access to MDATT, a software used for detailed POS tagging of twitter posts.

References

Clarke, I., & Grieve, J. (2019). Stylistic variation on the Donald Trump Twitter account: A linguistic analysis of tweets posted between 2009 and 2018. PLOS ONE, 14(9), e0222062. https://doi.org/10.1371/journal.pone.0222062

Hutto, C., & Gilbert, E. (2014). VADER: A parsimonious rule-based model for sentiment analysis of social media text. Proceedings of the International AAAI Conference on Web and Social Media, 8(1), 216–225. https://doi.org/10.1609/icwsm.v8i1.14550

Medimorec, S., & Pennycook, G. (2015). The language of denial: Text analysis reveals differences in language use between climate change proponents and skeptics. Climatic Change, 133(4), 597–605. https://doi.org/10.1007/s10584-015-1475-2

Peters, H. P. (2008). Scientists as public experts. In M. Bucchi & B. Trench (Eds.), Handbook of Public Communication of Science and Technology. Routledge.

Poole, R. (2022). Corpus-assisted Ecolinguistics. Bloomsbury Academic.