About this time last year, when I was a visiting researcher at the Centre for Corpus Approaches to Social Science (CASS) at Lancaster University, I attended the Encyclopedia of Shakespeare’s Language Symposium (28 June 2019). Among many interesting talks, Jonathan Culpeper presented some first results from his Shakespeare project, comparing the ten most frequent trigrams (3-grams) in Shakespeare with those in Early Modern English and Present-day Plays. I thought it would be interesting to compare these in turn with the most frequent trigrams in the Sydney Corpus of Television Dialogue, (SydTV) made up of recent US television series. After all, theatre plays share quite a few features of their communicative context with fictional TV series. Characters (on stage, on the screen) simultaneously address each other and the overhearing audience (in the theatre, at home…), and the dialogue functions both to tell a story and to entertain and guide the audience. (If you are interested in these functions, they are illustrated here.) So I extracted the ten most frequent trigrams from SydTV and compared them with the data presented by Jonathan Culpeper in his talk (Table 1).
At first glance, Table 1 suggests that there isn’t much overlap between fictional US television series and the fictional plays: No trigram is shared with Shakespeare, only one is shared with both Early Modern and Present-day Plays (in green: what do you), and only two trigrams are shared with present-day plays (in blue: I don’t know; going to be). In addition, it is possible that I don’t know (in column 3 and 4) is the modern equivalent of I know not (in columns 1 and 2). Some of the differences between the plays from earlier periods and recent TV series reflect general changes in English society and language. For example, we don’t normally address each other as my good lord anymore, and it’s well known that the use of be going to as semi-model to express future intention has increased in frequency. Further, there is the possibility that having ye, thou, and thee (all ‘you’ in modern English) would impact on the frequency of n-grams.
If we take a closer look, we can also observe that some of the n-grams in columns 1 and 2 might have contracted ‘alternatives’ which would be extracted as bigrams (2-grams), not trigrams (3-grams): I will not – I won’t; I am a – I’m a; I am not – I’m not; there is no – there’s no; I would not – I wouldn’t; it is a – it’s a; it is not – it’s not; and I will – and I’ll. If we incorporate these contracted bigrams in the table according to their raw frequencies in SydTV, a slightly different picture begins to emerge (Table 2). (Data from the present-day plays [column 3 in Table 1] are not included here, since I was not able to look up if they contain these bigrams – which means the data would not be comparable.)
Now, half of the n-grams in the third column, from recent US TV series, are shared with either the Shakespeare (blue) or Early Modern English (green) data or even both (purple). (The other relevant bigrams are less frequent in SydTV: I’m a [77]; there’s no [54]; I won’t [40]; I wouldn’t [35]; and I’ll [33]). This indicates a higher amount of continuity in the use of frequent trigrams in fictional, dramatic texts from the Early Modern English period to the 21st century than initially suggested in Table 1. Of course, this is just a (quantitative) snapshot and more work is necessary to explore these findings further.
References
Culpeper, J. & Kytö, M. (2010). Early Modern English Dialogues: Spoken Interaction as Writing. Cambridge: Cambridge University Press.