‘Power in numbers’: Using corpus linguistics to demonstrate media bias, by Alex Garcia Marrugo

on

“Corpus Linguists are the only ones that count”. The pun, included in the lecture slides of a Corpus Linguistics course for PhD students, resonated with me. Although not to be taken literally, it does highlight the need for quantitative evidence in discourse studies, especially those aiming to uncover ideologies underlying dominant forms of (mis) representation. In my case, I was examining how newspapers portrayed the violence committed by the different illegal actors in the Colombian conflict. But in reviewing the literature, I was left with the impression that many of the relevant studies were either based on a very limited number of texts and/or that the analysis was rather subjective and based on cherry-picked examples. Corpus-based analysis, on the other hand, offered the possibility of identifying patterns whose significance could be statistically determined. Now, the use of corpora in Critical Discourse studies has become common practice. For example, in the current issue of Discourse & Society, one of the leading journals in the field, three of the five studies included use corpus assisted methods, whereas in the last issue of 2009, when I was just starting my PhD, only one of the five did.

My interest in this topic stems from the widespread misunderstanding of the five-decade long conflict. Although the illegal actors involved – Marxist guerrillas and right-wing paramilitaries – are both responsible for atrocious human rights violations including massacres, kidnappings and selective murders, the death toll from the latter exceeded that of the guerrillas 3 to 1. Yet, only one in twenty Colombians would point at the paramilitaries as the main agents of violence. Even worse, up to one in four would justify their actions claiming self-defense and the need to fight the guerrillas as a reason. At this point, I usually need to clarify to a Colombian audience that my motivation for doing this research is not to defend the guerrillas, but to highlight the injustice against paramilitary victims, who, on top of the irreversible damage done to them, carry the burden of being labeled as ‘guerrilla aides’.

Despite its length and level of degradation, the fact is that the vast majority of Colombians have only experienced the conflict indirectly, mostly through the media. Then, it was logical to ask whether the representation of the conflict in the media was somehow related to the popular perception of paramilitaries as minor agents in the conflict.  For my PhD project, I compiled a corpus of over 500 reports (over 300,000words) of violent acts committed by either guerrillas or paramilitaries from 1998 to 2006 from the most read newspaper of each of the four largest cities in the country. I compared the representation of the perpetrators, of the act of killing and of the victims. The results were staggering. Every analysis conducted led to the same conclusion: the responsibility of the paramilitaries was systematically minimised, the scale of the violence diminished, and the humanity of the victims rendered invisible.

A concordancer allowed me to identify, for example, that direct references to guerrillas are 3.5 times more frequent than those to paramilitaries, who are more frequently referred to as ‘armed groups’, ‘hooded men’ or simply ‘hitmen’ (read more here). Moreover, while ‘murder’ is more frequent in the paramilitaries sub-corpus, it takes ‘FARC’ (Armed Revolutionary Forces of Colombia – Spanish Acronym – the largest guerrilla group) six times more frequently as a subject than AUC (United Self-Defence Groups of Colombia – paramilitary group).

Concordance lines for ‘AUC murder*’ – paramilitaries
Concordance lines for ‘Farc murder*’ – guerrillas

Paramilitaries are also more frequently portrayed as moving from one place to another than actually killing people. People are also more likely to just ‘die’ in news reports about paramilitaries, whereas they are more frequently ‘beheaded’, ‘tortured’ or ‘mutilated’ in news reports of guerrilla actions (read more here).

Even worse is the difference in the representation of the victims of each group. In the guerrilla subcorpus, references to kinship (e.g. mother, sister, husband), emotion (e.g. love, cried, pain), and the victims’ actual words are significantly overrepresented. Meanwhile, in the paramilitaries’ subcorpus, victims are more frequently referred to in generic terms such as ‘the dead’, ‘people’ or simply a number: “5 killed in ____”. Likewise, national authorities are conspicuously brief when commenting on paramilitary actions, while they vociferously condemn guerrilla’s crimes.

Each of these results is statistically significant. That is, there is a certainty of at least 95% (in most cases, it is over 99.99%) that they are not due to chance. The impact of this data on Colombian audiences is – as one person described it – ‘mind-blowing’. Even when ‘preaching to the choir’, an awareness of the level of systematicity in the concealment of paramilitary responsibility and the dehumanisation of the victims leaves people profoundly shocked. A blog entry summarising these results went viral and was reproduced in about a dozen independent online news sites. It was even cited in ‘Le Monde Diplomatique’ to explain the ‘NO’ vote for a peace agreement with Farc in 2016. Unsurprisingly, one of the newspapers in the study cancelled an interview after reviewing the results of my research in more detail.

In my opinion, any potential social impact of studying dominant discourses can only be maximised by strong, compelling evidence. And that is precisely what corpus linguistics offers.