Anti-social media
How to define the limits of free speech is a central debate in most modern democracies. This is particularly difficult in relation to hateful, abusive and racist speech. The pattern of hate speech is complex, but there is an increasing focus on the volume and nature of hateful or racist speech taking place online. This short study aims to inform the discussion over free speech and hate speech by examining specifically the way racial, religious and ethnic slurs are employed on Twitter. In this study, we aim to answer the following two questions:
(a) What is the volume and nature of the slurs used on Twitter? (And, where relevant, to what extent is there an overlap with hate speech?)
(b) What is the potential of automated machine learning techniques to accurately identify and classify slurs?
This research is based on three weeks of collecting all tweets containing racial, religious, or ethnic slurs on Twitter in November 2012. We ran two types of analysis against our data set.
In study one, we used automated machine classifiers to categorise the data sets. This involved human analysis of a sample to identify categories, followed by training a natural language processing technique to recognise and apply those categories to the whole of the data set.
In study two, we used human analysts to categorise subset samples of the data. This involved in-depth, iterative, human analysis of small and then larger random samples of the data to reveal a stable set of categories.
This project will be published in Autumn 2013.