the language of politics

Transcription

the language of politics
and
the language of politics
(updated May 2011)
1. What is Corpus Linguistics?
Corpus Linguistics is the study of language based on examples of real life language use.
2. What is a corpus?
A corpus is a computerized collection of texts amenable to automatic or semi-automatic
analysis. The texts are selected according to explicit criteria in order to capture the regularities
of a language, a language variety or a sub-language. This data may be in spoken, written or
intermediate forms (written and spoken merged) and can be used as a starting-point of linguistic
description or as a means of verifying hypotheses about a language.
Corpus is a Latin word. The plural is corpora.
3. What is the difference between a parallel corpus and a comparable corpus?
A parallel corpus is a corpus consisting of a set of texts in one language and their translation in
another language: they contain texts which stand in a translational relationship to each other. An
example of parallel corpora are all the European documents, for instance the Lisbon Treaty (2007),
written in 23 original languages (although we do not know here in which language the texts were
originated). For example, Article 8 below (in English and Italian) is taken from the European
Constitution (2004), written in 21 original languages:
Article I-8
Articolo I-8
The symbols of the Union
I simboli dell’Unione
The flag of the Union shall be a circle of
twelve golden stars on a blue
background.
The anthem of the Union shall be based
on the “Ode to Joy” from the Ninth
Symphony by Ludwig van Beethoven.
The motto of the Union shall be “United
in Diversity”.
The currency of the Union shall be the
euro.
Europe day shall be celebrated on 9 May
throughout the Union.
La bandiera dell’Unione rappresenta un
cerchio di dodici stelle dorate su sfondo
blu.
L'inno dell'Unione è tratto dall'«Inno alla
gioia» della Nona sinfonia di Ludwig van
Beethoven.
Il motto dell'Unione è: «Unita nella
diversità».
La moneta dell'Unione è l'euro.
La giornata dell'Europa è celebrata il 9
maggio in tutta l'Unione.
As we read in Article 448 of the failed European Constitution, all the languages are original. Of
course, this is hard to believe, and we think that one is the original language (very likely French
or English or both) and all the others are translations.
1
Article VI-448
Articolo VI-448
Authentic texts and translations
Testi autentici e traduzioni
1. This Treaty, drawn up in a single
original in the Czech, Danish, Dutch,
English, Estonian, Finnish, French,
German, Greek, Hungarian, Irish, Italian,
Latvian, Lithuanian, Maltese, Polish,
Portuguese, Slovak, Slovenian, Spanish
and Swedish languages, the texts in each
of these languages being equally
authentic, shall be deposited in the
archives of the Government of the
Italian Republic, which will transmit a
certified copy to each of the
governments of the other signatory
States. […]
1. Il presente trattato, redatto in unico
esemplare in lingua ceca, danese,
estone, finlandese, francese, greca,
inglese, irlandese, italiana, lettone,
lituana, maltese, olandese, polacca,
portoghese,
slovacca,
slovena,
spagnola,
svedese,
tedesca
e
ungherese, il testo in ciascuna di
queste lingue facente ugualmente fede,
sarà depositato negli archivi del
governo della Repubblica italiana, che
provvederà a trasmetterne copia
certificata conforme a ciascuno dei
governi degli altri Stati firmatari. […]
Another example of parallel texts are the in-flights magazines that we find on airplanes. For
example, if we fly from Roma to New York the magazine will almost certainly be written in
English and in Italian, where one of the two will be the original language and one its translation.
Another example of parallel texts are the articles of The Economist translated weekly by the
Italian magazine Economy (published with Panorama).
The Economist
The world’ leading banks decided
some years ago that lending is a
mug’s game. They began to get rid of
their loans, repacking them and selling
them off as securities, or getting
others to re-insure their risk. And the
policy has been bearing fruit. The glut
of corporate bankruptcies in 2001
and 2002 – including the two biggest
of all time, Enron and World Com –
have not had the devastating effect on
the big banks’ balance sheets that
might have been expected. The two
biggest banks in America, for
instance, have hardly registered a
tremor. Citigroup’s profits for the
second quarter of this year were $4.3
billion (12% up on a year earlier), and
those of J.P. Morgan Chase were $1.8
billion for the same …
Alcuni anni fa, le principali banche
internazionali hanno capito la scarsa
redditività dei prestiti. Così, hanno
cominciato a disfarsi dei propri mutui,
riconfezionandoli e rivendendoli sotto
forma di titoli, oppure ottenendo che
il rischio collegato ai prestiti venisse
preso in carico da altri. Questa politica
ha dato buoni frutti. I numerosi
fallimenti societari avvenuti nel 2001 e
2002 – compresi quelli di Enron e
WorldCom, i maggiori di tutti i tempi
in USA – infatti non hanno avuto
l’effetto devastante sui bilanci delle
grandi banche che molti prevedevano.
I due principali istituti americani ne
hanno risentito appena. Nel secondo
trimestre di quest’anno, Citigroup ha
realizzato 4,3 miliardi di dollari di utili
(+12% rispetto all’anno precedente)…
2
4.
5.
6.
7.
It is not easy to find parallel corpora; of course the easiest example of parallel corpora are
literature books. Let’s think, for instance, of all the translations of Shakespeare’s works in many
foreign languages.
A comparable corpus includes texts which are all original (that is, they are not translated), for
example the speeches of George W. Bush, Tony Blair, Gordon Brown, Barack Obama, David
Cameron, Nick Clegg, Silvio Berlusconi, Romano Prodi. They are comparable in terms of topic,
communicative function, size and time span (but not always).
What is the difference between a spoken corpus and a written corpus?
A spoken corpus is a collection of linguistic spoken data, namely a transcription of recorded speech
which may include speeches, interviews, statements, press conferences delivered by politicians
or any other person. The speeches are transcribed by expert transcribers. An example of spoken
corpus is ABC (American British Corpus), a corpus which includes speeches of American and
British politicians, from 1997 till today.
A written corpus is a collection of written texts, like for example ECCO (Economic
Comparable Corpus), a corpus assembled by the students of the Faculty of Economics
including articles on finance, economics and marketing.
Written corpora usually outnumber spoken corpora. For instance, the British National Corpus
(BNC) includes 90% of written texts and only 10% of spoken texts.
What are corpora used for?
Corpora are used for translation purposes, for teaching and studying purposes. Studying the
language included in a corpus assembled in 2011, for example, allows researchers and students
alike to analyse fresh and authentic language, namely the language which is really written and
spoken, and not the language which sometimes is included in text books which is not really
used by native speakers.
It is very important for learners and other users to examine only real instances of language.
There is no justification for inventing examples, although many people seem doomed to work
with invented material. To illustrate a simple subject-verb clause, something like Birds sing is not
good enough, even because very rarely will students find themselves using this phrase. Also the
example used to explain the passive voice, The apple is eaten by me (the active being I eat the apple),
although grammatically correct, will never be used and thus there is no point in learning this
type of sentence, which, surely, will never appear in a corpus.
Conversely, On January 20th Barack Obama was sworn in as the 44th President of the United States, or
British Prime Minister Gordon Brown was criticized for not taking part in the main ceremony, is surely a
more interesting and effective instance for students. It is essential to learn from actual
examples, examples that can be trusted because they have been used in real communication.
How many general English corpora are you aware of? Are they freely available?
The British National Corpus (BNC), the Bank of English (BoE), MICASE (Michigan Corpus of
Academic Spoken English), the Brown Corpus, the LOB (Lancaster-Oslo-Bergen) Corpus,
among many others.
The Bank of English (BoE) was launched in 1991 by COBUILD (a division of HarperCollins
Publishers) and The University of Birmingham. This huge collection is composed of a wide
range of different types of writing and speech. It contains samples of the English language from
hundreds of different sources. Written texts come from newspapers, magazines, fiction and
non-fiction books, brochures, leaflets, reports, letters, and so on. Spoken texts are represented
by transcriptions of everyday casual conversation, radio broadcasts, meetings, interviews and
discussions, etc. The material is up- to-date, with the majority of texts originating after 1990.
Taken together the Bank of English provides objective evidence about the English which most
people read, write, speak and hear every day of their lives. This corpus today stands at 450
million words.
The BNC and the BoE are not freely available. The subscription to the BoE, for example, is
GBP 500 a year.
How many general Italian corpora are you aware of? Are they freely available?
3
CORIS/CODIS has been available since September 2001. It consists of 100 million words and
it is up-dated every two years. It is composed of a collection of authentic texts, in electronic
format, designed to be representative of a wide cross-section of current Italian.
CORIS/CODIS is free of charge. All you need to do is to write to the University of Bologna,
tell them who you are, why you want to use the corpus, and they will provide you with a
password within 48 hours at the latest.
The first spoken Italian corpus is called CLIPS (Corpora e Lessici dell’Italiano Parlato e Scritto),
completed in 2004 and presented to the Italian community in May 2007 at the University of
Naples. This corpus is also free of charge, and we can have free access by contacting the
University of Naples who will provide a password. CORPS (Corpus of Political Speeches) was
released in January 2011 and is also freely available for research purposes: it is a corpus of
political speeches tagged with specific audience reactions, such as applause and laughter. We
found this very interesting, in that, whereas the American part of ABC includes both applause
and laughter, in the British part (and in the Italian) transcribers have opted for not maintaining
applause and laughter in their transcription, even though they occur. In the new release of
CLIPS there are more than 3600 speeches, about 7.9 millions words, and more than 67
thousand
tags about audience reactions. We believe that laughter plays, even more than applause, an
important role in interaction, hence deleting such signals is regarded as great information loss.
In ABC other markers like hesitations (erm), backchannelling (mhm), and others typical of
spontaneous speech have also been removed.
8. What are the advantages of studying through corpora? And the disadvantages?
The advantages are certainly those of studying authentic language without learning old and
obsolete expressions. The disadvantages are that, when speaking, people may make mistakes,
which almost always transcribers decide to keep in the transcription. Sometimes transcribers
write [sic], to indicate that was the speaker’s mistake. A student who does not realize that a
certain form is wrong might learn it thinking it is correct.
For example, in ABC we found a few instances of I am looking forward to continue working with you,
and this is an historical moment, rather than I am looking forward to continuing working with you and this is
a historical moment.
These forms are so frequently used today that they are almost accepted as being correct.
9. What is the most frequent word in all the speeches George Bush delivered from 2001 to 2008?
And in the speeches Tony Blair delivered from 1997 to 2007? And in the speeches delivered by
Barack Obama in the first two years of presidency? And in the speeches David Cameron and
Nick Clegg have delivered since May 2010? Is this first word a function word or a content
word?
The most frequent word in all these politicians is always THE. It is a function word (grammar
word) and it is always the most frequent word in the English language in general. It usually
amounts to about 6% of all the words in the whole corpus.
10. What is the first content word in ALL politicians in “normal” times? (not in election times)
In all the politicians we have looked at – Tony Blair, Gordon Brown, David Cameron, Nick
Clegg, George W. Bush, Condoleezza Rice, Bill Clinton, Hillary Clinton, Barack Obama, Joe
Biden – the first content word is always PEOPLE. In George W. Bush’s wordlist PEOPLE
ranks 23rd, in Tony Blair it ranks 25th; in Barack Obama PEOPLE ranks 33rd (see Figure 2). The
words coming before the word PEOPLE are all grammar (or function) words.
Yet, the most frequent content word in electoral times is very often different. For example, in
the speech Barack Obama delivered in Denver, Colorado, on 28 August 2008 when he accepted
the nomination, the most frequent word was not PEOPLE (ranking 88th) but PROMISE
(ranking 23rd), as Figure 3 shows. The second important content word was, not surprisingly,
CHANGE, ranking 49th.
4
Figure 1. WordList in Barack Obama’s speeches (first 30 words)
5
Figure 2. WordList in Barack Obama’s speeches (first 60 words)
6
Figure 3. WordList in Barack Obama’s speech delivered in Denver in August 2008
11. How do most American politicians end their speeches?
Most of them end their speeches saying: “God bless you”, or “God bless America”, or “God
bless you, and may God bless the United States of America”. Bush sometimes ended by simply
saying “God bless”. He also used to say “May God bless America and protect our troops”.
Barack Obama’s speeches also end with “God bless you, and may God bless the United States
of America”.
12. Can you give the definition of phrase? And of cluster?
A phrase is a multi-word unit. Different labels have been given to “phrase”: clusters, n-grams,
concgrams, lexical bundles, prefabs (prefabricated language).
Mike Scott in WordSmith Tools speaks of clusters rather than phrases. It is clear that the sum of
the single words does not correspond to the meaning of the whole phrase.
Whatever designation is preferred, the common thread is that words are not chosen freely, but
are placed on a cline between the open choice principle and the idiom principle. The latter
governs ‘prefabs’, where content is not given by its individual item but is attached to the whole
phrase. Thus, meaning is given by the unit as a whole, working in accordance with
phraseological conventions.
13. Can you give an example of a 2-word cluster? And of a 3-word cluster? And of a 4-word
cluster?
The list below includes only some examples. You can mention any examples you like.
as well = 2-word unit
in that
out there
in hindsight
so far = as yet
how come
7
at least
at stake
by far
for good
right away
to date
go broke
let alone
as well as = 3-word unit
a great deal
by the way
come into force
connect the dots
cut and run
deliver the goods
food for thought
foot the bill
give the floor
in order to
in spite of
ins and outs
into harm’s way
just like that
make ends meet
on behalf of
on my watch
on this ticket
time and again
to my mind
cast one’s ballot
take for granted
pay lip service
you name it
all of a sudden = 4-word unit
as soon as possible
for the time being
go into the red
from all walks of life
in the long run
on the brink of
see eye to eye on
so far so good
when it comes to
see eye to eye on = 5-word unit
from all walks of life
in a matter of months
stand shoulder to shoulder with
turn a blind eye to
at the end of the day = 6-word unit
and all the rest of it
Some expressions, like at the end of the day, and all the rest of it, just like that, so far, can be interpreted
both according to the open choice principle and the idiom principle.
8
The expression just like that did not appear in dictionaries until only a few years ago, although it
is a frequent cluster, mainly in spoken language, as Figure 4 shows. It has now finally appeared
in dictionaries.
The table shows that the semantic prosody (positive or negative connotation that every words
or phrase have) of just like that in a political context is negative (this doesn’t mean that just like
that has a negative semantic prosody in the English language in general): verbs like kill, behead,
chop somebody’s head off, cut your right hand off, occur in the proximity of the node, both on the left
and on the right side.
Figure 4. Just like that in George W. Bush
Table 5 below shows that the phrase just like that can also have a positive semantic prosody in
English. The data below is taken from a general corpus of written and spoken English (BNC).
9
Figure 5. Just like that in PIE (Phrases In English)
14. Give the definition of phrasal verb and provide some examples in context.
A phrasal verb is a verb followed by particles, that is an adverb or a preposition, sometimes
more than one, and just like phrases, the meaning of the verb is not given by the sum of the
individual words in it. Some phrasal verbs are very opaque, others are more transparent. Give up,
for example, has nothing to do with give, and call off has nothing to do with call. Thus, we should
never translate verbatim.
Phrasal verbs are very common in English, and having to choose between a phrasal verb and a
non-phrasal verb, an English native speaker will very likely prefer the phrasal verb: stop smoking
vs. give up smoking.
The list below includes only some examples. You can mention any examples you like.
CALL AFTER
LOOK AFTER
NAME AFTER
TAKE AFTER
PASS AWAY
GET AWAY WITH
PUT ASIDE
BE BACK
COME BACK
GIVE BACK
LOOK BACK
GET BY
BREAK DOWN
CALM DOWN
CLOSE DOWN
10
HAND DOWN
SHOOT DOWN
STEP DOWN
TURN DOWN
VOTE DOWN
CUT DOWN ON
CALL FOR
LOOK FOR
RUN FOR
STAND FOR
LOOK FORWARD
BREAK IN
CHIP IN
FILL IN
GIVE IN
HAND IN
OPT IN
SWEAR IN
LOOK LIKE
CALL OFF
PUT OFF
SHOW OFF
SWITCH OFF
TAKE OFF
TURN OFF
COME ON
GET ON
GO ON
HANG ON
HOLD ON
PASS ON
HOLD ON TO
BAIL OUT
BREAK OUT
CARRY OUT
DROP OUT
FALL OUT
FIGURE OUT
FILL OUT
LIVE OUT
LOOK OUT
PASS OUT
POINT OUT
PUT OUT
RUN OUT
SPEAK OUT
SPELL OUT
SORT OUT
TURN OUT
RULE OUT
WATCH OUT
WORK OUT
11
BREAK OUT OF
OPT OUT OF
RUN OUT OF
TAKE OVER
CARRY THROUGH
BREAK UP
CALL UP
GET UP
GIVE UP
GIVE UP ON
GROW UP
HOLD UP
LOOK UP
MAKE UP
PICK UP
RUN UP
SIGN UP
SPEAK UP
SPEED UP
SUM UP
TURN UP
WAKE UP
STAND UP FOR
CATCH UP WITH
DEAL WITH
KEEP UP WITH
PUT UP WITH
15. What is the most frequent verb in the politicians under study in ABC? And the most frequent
2-word phrasal verb? And the most frequent 3-word phrasal verb?
The most frequent verb in ABC is want, followed by know, make, think, get, work, thank, like, need
and say among the first ten.
Bearing in mind that “some words are lonelier than others”, it is soon evident that, with the
exception of a few verbs which make meaning also on their own, and which are typical verbs of
spoken language, most of the others need a particle or some other word to account for such a
high ranking. The lexical verb make, for example, which is the third most common verb in
ABC, almost certainly does not rank so high carrying the meaning of “create or produce
something by working”: it ranks so high because it lends itself to creating several phrases, like
for example I want to make sure. The verb make collocates with many words, and in ABC we
found several instances of make progress, make great strides, make sacrifices, make a mistake, make a
judgment, make a decision, make a choice, make sense, make up one’s mind, make your voice heard.
Relying on the clusters facility provided by WordSmith Tools, we found out that the most
frequent 2-word phrasal verb in ABC is deal with, followed by provide with, set up, go back, look for,
look forward, move forward, figure out, end up, go ahead, stand up, among the first ten.
The most frequent 3-word phrasal verb is look forward to, followed by get out of, come up with, live up
to, stand up for. The most frequent verb occurring in company with look forward to is work: I look
forward to working with you.
12
Figure 6. Concordance lines of look forward to working in ABC
The multi-word verbs listed above are indeed phrasal-prepositional verbs rather than phrasal
verbs. English has at its disposal, just like multi-word units, various kinds of multi-word verbs:
phrasal verbs, prepositional verbs and phrasal-prepositional verbs, but for the sake of
convenience we have actually listed, in order of frequency, phrasal verbs, prepositional verbs
and phrasal-prepositional verbs all together, without any distinction among the three, so that we
find phrasal verbs like find out, give up, figure out, take off, prepositional verbs like look for, talk about,
look at, think about, depend on, and phrasal-prepositional verbs like look forward to, come up with, hold
on to, put up with, reach out to, stand up for.
It is interesting to notice that these verbs, containing two-, three-, four- and five-word verbs, in
company with the particle some verbs lose completely their original meaning, e.g. it turns out that
has no semantic relationship with turn, neither do give up and give up on with give. Other verbs are
more transparent, and the preposition extends the usual meaning of the verb, as is the case in
verbs like go away, come up, or sit down. Others, instead, make meaning only with the particle and
are not found independently as verbs, like, for example, fend for, sum up, zero in on, tamper with,
which have no existence on their own.
It is said that phrasal verbs are extremely difficult to learn: native speakers not only manage
them with aplomb, but seem to prefer them to single word alternatives. Conversely, learners
tend to avoid them and instead prefer to rely on larger, rarer, and clumsier words which make
the language sound stilted and awkward. Thus, if learners have to choose between carry out and
perform or undertake, or between put up with and tolerate they will certainly opt for the second. Yet,
the whole draft of the historical development of English has been towards the replacement of
words by phrases, and in ABC for example we found 32 instances of turned down the Treaty
versus 8 of rejected the Treaty.
Once we learn that phrases are handled like a single unit, we will not expect, for instance, turn
and down to have a meaning on their own, because the two words occurring together have been
stored in the mind as a holistic unit.
We can conclude quoting Searle, when he argues that there is a conversational maxim that reads
as follows: “Speak idiomatically unless there is some special reason not to”.
16. Is language idiomatic? And to what extent? Justify your answer.
Yes, language is idiomatic. John Sinclair scientifically proved that about 80% of language is
governed by the idiom principle and 20% by the open choice principle. Words are attracted to
13
other words and they tend to occur always in company with each other more often than chance
would predict, for no apparent reason other than convention and habit.
17. What is the meaning of “Words cluster just as people do”?
Words attract each other also beyond idioms (when we speak of the idiom principle, we don’t
have to think only of idiomatic expressions, like to rain cats and dogs). Words are like people. As it
happens with people who like other people and tend to spend time together and go out
together, so there are words that enjoy the company of other words, and like other words.
Thus, we speak of attraction, indifference and repulsion.
A clear example can be the phrase Merry Christmas: the English native-speaker routinely says
Merry Christmas, Happy Christmas and Happy Birthday but not Merry Birthday. Christmas always
occurs in company with Merry, so it is attracted to Merry, it is indifferent to Happy (it is not
grammatically wrong, some people say Happy Christmas, but less often than Merry Christmas). On
the contrary, Merry Birthday is wrong, Birthday is not attracted to Merry, they repel each other.
That’s when we talk of repulsion.
18. How many features does WordSmith Tools have?
WordSmith Tools has three features: Concord, Keywords, WordList.
Figure 7. WordSmith Tools 5.0
19. What’s the function of Concord? What’s the function of Wordlist? What’s the function of
Keywords?
As its name indicates, WordList creates word lists, ordering words by frequency (Figures 1 and
2) and alphabetically. Word frequency information is very useful in identifying characteristics of
a text or of a genre.
Concord is a tool which locates all references to any given word or phrase within our corpus,
showing them in standard concordance lines with the search word (also called node word)
centred and a variable amount of context at either side (usually N-5 and N+5: five words to the
left and five words to the right). This tool allows further examination of the company a word
keeps (its collocates) to be studied. The figures below shows the concordance lines of the word
TIME in Barack Obama’s corpus:
14
Figure 8. Concordance to time without any sort in ABC
Figure 9. Concordance to time sorted to the left (L1-L2-L3) in ABC
15
Figure 10. Concordance to time sorted to the right (R1-R2-R3) in ABC
Figure 11. Concordance to time sorted to the right and to the left (R1-R2-L1) in ABC
The Keyword list uses the word lists described above, and compares them. The idea is quite
simple: if a word is found to be much more frequent in one corpus with respect to another, it is
a “keyword”. The notion underlying this is therefore “outstandingness” based on comparison.
16
We create a Keyword list by referencing a study corpus (usually smaller) against a reference
corpus (usually larger, ideally five times larger) (Figure 12).
Figure 12. Word lists in Barack Obama and George W. Bush (30-60)
20. What keywords have emerged by referencing George Bush’s speeches against Barack Obama’s
speeches? And by referencing Barack Obama’s speeches against George Bush’s speeches?
By comparing Barack Obama to George W. Bush’s word list, the aim is to unveil the words and
phrases that the current President tends to employ, trying also to find those which can be
regarded as the “signature”, as it were, of Barack Obama that distinguish him from the former
president, and what makes him so different. This might also allow us to understand what
persuaded the people, even those who disagreed with him, vote for him, hence what made the
swing states1 become blue after the elections (Figures 13 and 14) voting for Obama.
1
In United States presidential politics, a swing state, also known as purple state (purple being the
combination of the colors red and blue, which are used to represent Republican- and Democratic-majority
states respectively) is a state where no single candidate or party has overwhelming support in securing that
state’s electoral college votes.
17
Figure 13. Electoral presidential landscape of the United States before the 2009 elections
Figure 14. Electoral presidential landscape of the United States after the 2009 elections
Figure 15 shows the keywords emerged by referencing Bush against Obama: Iraq, freedom, terror,
terrorists, war, rank top of the list.
18
Figure 15. Keywords emerged by referencing Bush vs Obama
19
One word that might arise interest in Bush’s language is appreciate (ranking 9th in Figure 15), used
by the former president with the function of Thank you, as the table below illustrates:
Figure 16. Appreciate in George W. Bush
Sometimes the verb is also used without the subject, appreciate it (lines 1-2 and 5 below) or even
appreciate you. A few instances of appreciate alone were also found.
Figure 17. Appreciate it and Appreciate you in George W. Bush
Figure 18 shows the keywords emerged by comparing Obama vs Bush:
20
Figure 18. Keywords emerged by referencing Obama vs Bush
It is not surprising that words like insurance, recovery, crisis, health and clean figure top of the list. It
is interesting to note that in both lists the former and the current Presidents’ wives figure
among the first words2: Laura ranking 27th and Michelle ranking 31st.
Grouping the keywords emerged by referencing Obama against Bush by semantic field, three
main areas have surfaced:
1. economic crisis and recovery
2. clean energy and climate change
3. health care reform
As Scott points out, “a lot depends on the corpus”: our results would certainly be different if
we referenced the British Prime Minister’s speeches against the President of the United States’
speeches.
21. What phrases have emerged in Barack Obama’s speeches?
The tables below show 3-word clusters and 4-word clusters emerged in Barack Obama’s
speeches.
2
The software also shows that, unlike George W. Bush, Barack Obama mentions his two daughters’ names
very frequently: Sasha and Malia.
21
Among the 3-word clusters it is worth mentioning the following: to make sure, as well as, men and
women, around the world, across the country, in terms of, first of all, the recovery act, on behalf of, health care
system, health care reform, in order to.
Figure 19a. three-word clusters in Obama’s speeches
22
Figure 19b. three-word clusters in Obama’s speeches
23
Figure 19c. three-word clusters in Obama’s speeches
Since language is phraseological, learning a mere list of words doesn’t help much, because most
words have meaning only embedded in phrases or have a different meaning when embedded in
phrases. Knowing a language does not only mean learning a list of words but also and mostly
learning how words combine with one another. For example the three words just, like, and that,
combined together create a meaning different from the words used individually. Learning the
word behalf alone, for example, doesn’t help much, in that behalf is always found in company
with on and of, in the phrase on behalf of, as is shown in Figure 19b (ranking 67).
24
Among the 4-word clusters it is worth mentioning the following: to make sure that, thank you very
much, united States of America, thank you so much, when it comes to, I just want to, all across the country, a
lot of people.
Figure 20. four-word clusters in Obama’s speeches
22. What key-phrases have emerged by referencing Barack Obama’s speeches against George
Bush’s speeches?
25
WordSmith Tools allows us to yield not only a list of words and keywords but also a list of
phrases and key-phrases (or key-clusters).
The key-clusters below in Figure 21 have emerged comparing Barack Obama’s phrases to
George Bush’s phrases, and indicate the phrases that Obama utters much more frequently than
Bush. They indicate the main concerns in Obama which were not prioritized in Bush’s
government: the recovery act (ranking 7th) and health care reform (ranking 21st). Have a seat (ranking
33rd), usually uttered by Obama with the word please, Please have a seat (which in fact appears in
the 4-word clusters) is indeed hardly ever pronounced by Bush, who preferred to say Please be
seated.
Figure 21. three-word key-clusters obtained referencing Obama’s speeches vs Bush’s speeches
26
With the opposite procedure, comparing Bush’s speeches to Obama’s speeches, the following
clusters have emerged:
Figure 22. four-word key-clusters obtained referencing Bush’s speeches vs Obama’s speeches
Bearing in mind that meaning is an unstable entity which is not created by single words but by
their interaction, the clusters emerged, even better than individual words, show the main
concern of the former American President with respect to the current: the war on terror, weapons of
mass destruction, for the sake of, September the 11th, in the Middle East, as a matter of fact, no child let behind
act.
27
To conclude, phraseology plays a prominent role in discourse, hence clusters are certainly much
better at revealing the ‘aboutness’ of the text (and the context) than individual words.
Furthermore, relying on the assumption that frequency is a guide to importance, key-clusters are
even better at unveiling the main concerns of one politician with respect to another.
28