How to detect hatespeech in plain text #schildnvrienden

Yesterday there was a pretty controversial Pano TV documentary called 'Wie is Schild & Vrienden echt' at the national television channel 'één' (https://www.vrt.be/vrtnu/a-z/pano/2018/pano-s2018a10). The documentary revealed the internal communication of a right-wing group from Belgium, called #schildnvrienden.

After that, there was a show by Van Gils & gasten where a representative of the police explained or tried not to explain how the police can or can not monitor online private groups. It was pretty hilarious how she tried to manage not to say anything about the internal online monitoring system they apparently have.

That reminded me that a few years ago, I created an R package which can easily detect hate speech. I finally put it online on github today. You can find it here. The R package used a dictionary which is made available by the University of Antwerp which I think is the basis of the hate speech detection algorithms that currently the police in Belgium is running.

Example

How does that hate speech detection system work? Pretty simple, a dictionary of hate speech terminology and hate speech regular expressions are set up and next you just provide some text to it, the data is being cut up into words and it sees which words are part of the dictionary. As an example below, let's try it out on a message by the leader of that #schildnvrienden group to see if it is considered hate speech.

screenshot twitter dvanlangenhove 20180901

library(udpipe)
library(hatespeech.dutch)
detect_hatespeech("Europa wordt élke dag geteisterd door geweld van illegalen.
Zowel voor mensen die zich zorgen maken over dit geweld als voor mensen
 die zich zorgen maken over de boze reactie van Europeanen òp dit geweld zou
 oplossing duidelijk moeten zijn: alle illegalen opsporen en deporteren.",
 type = "udpipe")
    Neutral-Country   Neutral-Migration Neutral-Nationality 
                  0                   1                   0 
   Neutral-Religion  Neutral-Skin_color      Racist-Animals 
                  0                   0                   0 
     Racist-Country        Racist-Crime      Racist-Culture 
                  0                   0                   0 
    Racist-Diseases    Racist-Migration  Racist-Nationality 
                  0                   0                   0 
        Racist-Race     Racist-Religion   Racist-Skin_color 
                  0                   0                   0 
 Racist-Stereotypes 
                  0

So apparently the dictionary logic considers this statement as Neutral-Migration. Hope the police have improved on the natural language processing a bit such that they have incorporated a bit more than just word lookup and regular expressions. Feel free to try the hate speech detector out on your own text using the R package made available at https://github.com/weRbelgium/hatespeech.dutch. Or visit the website to see to the dictionaries which are used to detect hate speech.

Training on Text Mining

Are you interested in how text mining techniques work, then you might be interested in the following data science courses that are held in the coming months.

  • 08-09/10/2018: Text mining with R. Brussels (Belgium). http://di-academy.com/bootcamp + send mail to This email address is being protected from spambots. You need JavaScript enabled to view it.
  • 15-16/10/2018: Statistical machine learning with R. Leuven (Belgium). Subscribe here
  • 20-21/11/2018: Text mining with R. Leuven (Belgium). Subscribe here
  • 19-20/12/2018: Applied spatial modelling with R. Leuven (Belgium). Subscribe here
  • 21-22/02/2018: Advanced R programming. Leuven (Belgium). Subscribe here
  • 13-14/03/2018: Computer Vision with R and Python. Leuven (Belgium). Subscribe here
  •      15/03/2019: Image Recognition with R and Python: Subscribe here
  • 01-02/04/2019: Text Mining with R. Leuven (Belgium). Subscribe here