Koning Filip lijkt op ...

Last call for the course on Text Mining with R, held next week in Leuven, Belgium on April 1-2. Viewing the course description as well as subscription can be done at https://lstat.kuleuven.be/training/coursedescriptions/text-mining-with-r

Some things you'll learn ... is that King Filip of Belgium is similar to public expenses if we just look at open data from questions and answers in Belgian parliament (retrieved from here http://data.dekamer.be). Proof is below. See you next week.koning filip

data("dekamer", package = "ruimtehol")
dekamer$x <- strsplit(dekamer$question, "\\W")
dekamer$x <- lapply(dekamer$x, FUN = function(x) setdiff(x, ""))
dekamer$x <- sapply(dekamer$x, FUN = function(x) paste(x, collapse = " "))
dekamer$x <- tolower(dekamer$x)
dekamer$y <- strsplit(dekamer$question_theme, split = ",")
dekamer$y <- lapply(dekamer$y, FUN=function(x) gsub(" ", "-", x))
model <- embed_tagspace(x = dekamer$x, y = dekamer$y,
                        early_stopping = 0.8, validationPatience = 10,
                        dim = 50,
                        lr = 0.01, epoch = 40, loss = "softmax", adagrad = TRUE,
                        similarity = "cosine", negSearchLimit = 50,
                        ngrams = 2, minCount = 2)embedding_words  <- as.matrix(model, type = "words")
embedding_labels <- as.matrix(model, type = "labels", prefix = FALSE)
embedding_person <- starspace_embedding(model, tolower(c("Theo Francken")))
embedding_person <- starspace_embedding(model, tolower(c("Koning Filip")))
similarities <- embedding_similarity(embedding_person, embedding_words, top = 9)
similarities <- subset(similarities, !term2 %in% c("koning", "filip"))
similarities$term <- factor(similarities$term2, levels = rev(similarities$term2))
plt1 <- barchart(term ~ similarity | term1, data = similarities,
         scales = list(x = list(relation = "free"), y = list(relation = "free")),
         col = "darkgreen", xlab = "Similarity", main = "Koning Filip lijkt op ...")similarities <- embedding_similarity(embedding_person, embedding_labels, top = 7)
similarities$term <- factor(similarities$term2, levels = rev(similarities$term2))
plt2 <- barchart(term ~ similarity | term1, data = similarities,
         scales = list(x = list(relation = "free"), y = list(relation = "free")),
         col = "darkgreen", xlab = "Similarity", main = "Koning Filip lijkt op ...")
c(plt1, plt2)

Human Face Detection with R

Doing human face detection with computer vision is probably something you do once unless you work for police departments, you work in the surveillance industry or for the Chinese government. In order to reduce the time you lose on that small exercise, bnosac created a small R package (source code available at https://github.com/bnosac/image) which wraps the weights of a Single Shot Detector (SSD) Convolutional Neural Network which was trained with the Caffe Deep Learning kit. That network allows to detect human faces in images. An example is shown below (tested on Windows and Linux).

install.packages("image.libfacedetection", repos = "https://bnosac.github.io/drat")
image <- image_read("http://bnosac.be/images/bnosac/blog/wikipedia-25930827182-kerry-michel.jpg")
faces <- image_detect_faces(image)
plot(faces, image, border = "red", lwd = 7, col = "white")

libfacedetection example

What you get out of this is for each face the x/y locations and the width and height of the face. If you want to extract only the faces, loop over the detected faces and get them from the image as shown below.

allfaces <- Map(
    x      = faces$detections$x,
    y      = faces$detections$y,
    width  = faces$detections$width,
    height = faces$detections$height,
    f = function(x, y, width, height){
      image_crop(image, geometry_area(x = x, y = y, width = width, height = height))
allfaces <- do.call(c, allfaces)

Hope this gains you some time when doing which seems like a t-test of computer vision. Want to learn more on computer vision, next time just follow our course on Computer Vision with R and Python: https://lstat.kuleuven.be/training/coursedescriptions/ComputervisionwithRandPython

Making thematic maps for Belgium

For people from Belgium working in R with spatial data, you can find excellent workshop material on creating thematic maps for Belgium at https://workshop.mhermans.net/thematic-maps-r/index.html. The workshop was given by Maarten Hermans from HIVA - Onderzoeksinstituut voor Arbeid en Samenleving.
The plots are heavily based on BelgiumMaps.Statbel - an R package from bnosac released 2 years ago (more info at http://www.bnosac.be/index.php/blog/55-belgiummaps-statbel-r-package-with-administrative-boundaries-of-belgium
thematic maps r

An overview of the NLP ecosystem in R (#nlproc #textasdata)

At BNOSAC, R is used a lot to perform text analytics as it is an excellent tool that provides anything a data scientist needs to perform data analysis on text in a business settings. For users unfamiliar with all the possibilities that the wealth of R packages offers regarding text analytics, we've made this small mindmap showing a list of techniques and R packages that are used frequently in text mining projects set up by BNOSAC. Download the image and let your eyes zoom in on the different topics. Hope it broadens your idea of what is possible. Want to learn more or get hands on: http://www.bnosac.be/index.php/training

NLP R ecosystem

Neural Text Modelling with R package ruimtehol

Last week the R package ruimtehol was released on CRAN (https://github.com/bnosac/ruimtehol) allowing R users to easily build and apply neural embedding models on text data.

It wraps the 'StarSpace' library https://github.com/facebookresearch/StarSpace allowing users to calculate word, sentence, article, document, webpage, link and entity 'embeddings'. By using the 'embeddings', you can perform text based multi-label classification, find similarities between texts and categories, do collaborative-filtering based recommendation as well as content-based recommendation, find out relations between entities, calculate graph 'embeddings' as well as perform semi-supervised learning and multi-task learning on plain text. The techniques are explained in detail in the paper: 'StarSpace: Embed All The Things!' by Wu et al. (2017), available at https://arxiv.org/abs/1709.03856.

You can get started with some common text analytical use cases by using the presentation we have built below. Enjoy!


If you like it, give it a star at https://github.com/bnosac/ruimtehol and if you need commercial support on text mining, get in touch.

Upcoming training schedule

Note also that you might be interested in the following courses held in Belgium

  • 21-22/02/2018: Advanced R programming. Leuven (Belgium). Subscribe here
  • 13-14/03/2018: Computer Vision with R and Python. Leuven (Belgium). Subscribe here
  •      15/03/2019: Image Recognition with R and Python: Subscribe here
  • 01-02/04/2019: Text Mining with R. Leuven (Belgium). Subscribe here