Skip to the content.

A study of quote similarity among politicians


In this project, we investigate similarities among American politicians in different aspects. First, we perform a study by comparing the actual content of the quotes made by the politicians, and compare the politicians by the quotes that they’ve made. Here we find interesting clusters of politicians that make similar quotes.

Second, we extend the study with a time correlation study to find politicians that tend to be quoted at the same time. This could for example suggest that these politicians focus on the same subject.

Lastly, we study how both the time correlation and quote similarity have varied over the last three years in the QuoteBank dataset for an interesting pair of politicians, namely the former Secretary of State John Kerry and the current President Joe Biden.


The main dataset used in this project is QuoteBank, which contains 178 million quotations together with the name of the speaker [1]. The dataset has been collected from news articles in English on the internet, from August 2008 until April 2020. We also use Wikidata for gaining additional information about the quoted individuals.

Before we begin, let’s gain some more insight into the data we use for the project. In 2019, 3005 US policians were quoted. In total these were quoted 1’306’702 nr of times. Below you can find the distribution of gender and party among the politicians in the dataset.

Gender distribution in QuoteBank in 2019