BelgiumMaps.StatBel: R package with Administrative boundaries of Belgium

We recently opened up the BelgiumMaps.StatBel package and made it available at https://github.com/bnosac/BelgiumMaps.StatBel. This R package contains maps with administrative boundaries (national, regions, provinces, districts, municipalities, statistical sectors, agglomerations (200m)) of Belgium extracted from Open Data at Statistics Belgium. 

belgiummaps statbel

The package is a data-only package where maps of administrative zones in Belgium are available in the WGS84 coordinate reference system. The data is available in several objects:

  • BE_ADMIN_SECTORS: a SpatialPolygonsDataFrame with polygons and data at the level of the statistical sector
  • BE_ADMIN_MUNTY: a SpatialPolygonsDataFrame with polygons and data at the level of the municipality
  • BE_ADMIN_DISTRICT: a SpatialPolygonsDataFrame with polygons and data at the level of the district
  • BE_ADMIN_PROVINCE: a SpatialPolygonsDataFrame with polygons and data at the level of the province
  • BE_ADMIN_REGION: a SpatialPolygonsDataFrame with polygons and data at the level of the region
  • BE_ADMIN_BELGIUM: a SpatialPolygonsDataFrame with polygons and data at the level of the whole of Belgium
  • BE_ADMIN_HIERARCHY: a data.frame with administrative hierarchy of Belgium
  • BE_ADMIN_AGGLOMERATIONS: a SpatialPolygonsDataFrame with polygons and data at the level of an agglomeration (200m)

The R package is available at our rcube at www.datatailor.be under the CC-BY 2 license and can be installed as follows:

install.packages("sp")
install.packages("BelgiumMaps.StatBel", repos = "http://www.datatailor.be/rcube", type = "source")

The core data of the package contains administrative boundaries at the level of the statistical sector which can easily be plotted using the sp or the leaflet package.

library(BelgiumMaps.StatBel)
data(BE_ADMIN_SECTORS)
bxl <- subset(BE_ADMIN_SECTORS, TX_RGN_DESCR_NL %in% "Brussels Hoofdstedelijk Gewest")
plot(bxl, main = "NIS sectors in Brussels")

bxl sectors

All municipalities, districts, provinces, regions and country level boundaries are also directly available in the package.

data(BE_ADMIN_SECTORS)
data(BE_ADMIN_MUNTY)
data(BE_ADMIN_DISTRICT)
data(BE_ADMIN_PROVINCE)
data(BE_ADMIN_REGION)
data(BE_ADMIN_BELGIUM)
plot(BE_ADMIN_MUNTY, main = "Belgium municipalities/districts/provinces")
plot(BE_ADMIN_DISTRICT, lwd = 2, add = TRUE)
plot(BE_ADMIN_PROVINCE, lwd = 3, add = TRUE)

belgium municipalities

The package also integrates well with other public data from Statistics Belgium as it contains spatial identifiers (nis codes, nuts codes) which you can use to link to other datasets. The following R code example creates an interactive map displaying net taxable income by statistical code for Brussels.

If you are looking for mapping data about Belgium, you might also be interested in the BelgiumStatistics package (which can be found at https://github.com/weRbelgium/BelgiumStatistics) containing more general statistics about Belgium or the BelgiumMaps.OpenStreetMap package (https://github.com/weRbelgium/BelgiumMaps.OpenStreetMap) which contains geospatial data of Belgium regarding landuse, natural, places, points, railways, roads and waterways, extracted from OpenStreetMap.

library(BelgiumMaps.StatBel)
library(leaflet)

## Get taxes / statistical sector
tempfile <- tempfile()
download.file("http://statbel.fgov.be/nl/binaries/TF_PSNL_INC_TAX_SECTOR_tcm325-278417.zip", tempfile)
unzip(tempfile, list = TRUE)
taxes <- read.table(unz(tempfile, filename = "TF_PSNL_INC_TAX_SECTOR.txt"), sep="|", header = TRUE, encoding = "UTF-8", stringsAsFactors = FALSE, quote = "", na.strings = c("", "C"))
colnames(taxes)[1] <- "CD_YEAR"

## Get taxes in last year
taxes <- subset(taxes, CD_YEAR == max(taxes$CD_YEAR))
taxes <- taxes[, c("CD_YEAR", "CD_REFNIS_SECTOR",
"MS_NBR_NON_ZERO_INC", "MS_TOT_NET_TAXABLE_INC", "MS_AVG_TOT_NET_TAXABLE_INC",
"MS_MEDIAN_NET_TAXABLE_INC", "MS_INT_QUART_DIFF", "MS_INT_QUART_COEFF",
"MS_INT_QUART_ASSYM")]

## Join taxes with the map
data(BE_ADMIN_SECTORS, package = "BelgiumMaps.StatBel")
data(BE_ADMIN_DISTRICT, package = "BelgiumMaps.StatBel")
data(BE_ADMIN_MUNTY, package = "BelgiumMaps.StatBel")
str(BE_ADMIN_SECTORS@data)
mymap <- merge(BE_ADMIN_SECTORS, taxes, by = "CD_REFNIS_SECTOR", all.x=TRUE, all.y=FALSE)
mymap <- subset(mymap, TX_RGN_DESCR_NL %in% "Brussels Hoofdstedelijk Gewest")

## Visualise the data
pal <- colorBin(palette = rev(heat.colors(11)), domain = mymap$MS_AVG_TOT_NET_TAXABLE_INC,
bins = c(0, round(quantile(mymap$MS_AVG_TOT_NET_TAXABLE_INC, na.rm=TRUE, probs = seq(0.1, 0.9, by = 0.1)), 0), +Inf),
na.color = "#cecece")

m <- leaflet(mymap) %>%
addTiles() %>%
addLegend(title = "Net Taxable Income (EURO)",
pal = pal, values = ~MS_AVG_TOT_NET_TAXABLE_INC,
position = "bottomleft", na.label = "Missing") %>%
addPolygons(color = ~pal(MS_AVG_TOT_NET_TAXABLE_INC),
stroke = FALSE, smoothFactor = 0.2, fillOpacity = 0.85,
popup = sprintf("%s: %s<br>%s: %s<br><br>%s €: Average net taxable income<br>%s €: Median net taxable income<br>%s declarations",
mymap$TX_SECTOR_DESCR_NL, mymap$TX_MUNTY_DESCR_NL,
mymap$TX_SECTOR_DESCR_FR, mymap$TX_MUNTY_DESCR_FR,
mymap$MS_AVG_TOT_NET_TAXABLE_INC, mymap$MS_MEDIAN_NET_TAXABLE_INC,
mymap$MS_NBR_NON_ZERO_INC))
#m <- addPolylines(m, data = BE_ADMIN_DISTRICT, weight = 1.5, color = "black")
m <- addPolylines(m, data = subset(BE_ADMIN_MUNTY,
TX_RGN_DESCR_NL %in% "Brussels Hoofdstedelijk Gewest"), weight = 1.5, color = "black")
m  

bxl income

 

If you are interested in all of this, you might be interested also in attending our course on Applied Spatial Modelling with R which will be held at LStat (Leuven, Belgium) on  8-9 December 2016. More information: https://lstat.kuleuven.be/training/applied-spatial-modelling-with-r

For all other enquiries:  Get in touch

Sentiment analysis and Parts of Speech tagging in Dutch/French/English/German/Spanish/Italian

As part of our continuing effort to digitise poetry and to automate new forms of poetry, we released an R package called pattern.nlp, which is available at https://github.com/bnosac/pattern.nlp . It allows R users to do sentiment analysis and Parts of Speech tagging for text written in Dutch, French, English, German, Spanish or Italian. Of course this can also be used for other purposes like data preparation as part of a topic modelling flow.

pattern nlp logo

If you are interested in text mining, feel free to register for the text mining courses listed at our last blog post.

If you just want to do sentiment analysis and POS tagging in these 5 European languages, go ahead as follows. Sentiment analysis is available for Dutch, French & English.

library(pattern.nlp)

## Sentiment analysis
x <- pattern_sentiment("i really really hate iphones", language = "english")
y <- pattern_sentiment("de wereld is een mooie plaats, nietwaar sherlock", language = "dutch")
z <- pattern_sentiment("j'aime Paris, c'est super", language = "french")
rbind(x, y, z)

polarity subjectivity id
-0.80 0.90 i really really hate iphones
0.70 1.00 de wereld is een mooie plaats, nietwaar sherlock
0.65 0.75 j'aime Paris, c'est super

Parts of Speech tagging is available for Dutch, French, English, Spanish & Italian.

library(pattern.nlp)

x <- "Il pleure dans mon coeur comme il pleut sur la ville. Quelle est cette langueur qui penetre mon coeur?"
pattern_pos(x = x, language = 'french')

x <- "Avevamo vegliato tutta la notte - i miei amici ed io sotto lampade
di moschea dalle cupole di ottone traforato, stellate come le nostre anime,
perché come queste irradiate dal chiuso fulgòre di un cuore elettrico."
pattern_pos(x = x, language = 'italian')

pos example1

 

We are also working on a Dutch wordnet - which will be fully released in due date. More information at https://github.com/weRbelgium/wordnet.dutch.Hope you use the package for spreading new languages!

Text Mining with R - upcoming training schedule

Part of the R course offering of BNOSAC which you can find at http://bnosac.be/images/bnosac/bnosac_courses_r.pdf, we offer several 2-day hands-on courses covering the use of text mining tools for the purpose of data analysis. It covers basic text handling, natural language engineering and statistical modelling on top of textual data.

tm predictive
Interested in upgrading your skills on text mining with R? Registering can be done for the following days.

2016: October 24-25: subscribe at https://lstat.kuleuven.be/training/coursedescriptions/text-mining-with-r
2016: November 14-15: subscribe at http://di-academy.com/event/text-mining-with-r/
2017: March 23-24: subscribe at https://lstat.kuleuven.be/training/coursedescriptions/text-mining-with-r

The following elements are covered in this course.

  1. Import of (structured) text data with focus on text encodings. Detection of language
  2. Cleaning of text data, regular expressions
  3. String distances
  4. Graphical displays of text data
  5. Natural language processing: stemming, parts-of-speech (POS) tagging, tokenization, lemmatisation, entity recognition
  6. Sentiment analysis
  7. Statistical topic detection modelling and visualisation (latent dirichlet allocation)
  8. Automatic classification using predictive modelling based on text data
  9. Visualisation of correlations & topics
  10. Word embeddings
  11. Document similarities & Text alignment

Hope to see you there.

Good news from Belgium: Course on Applied spatial modelling with R (April 13-14)

applied spatial

Within 2 weeks, our 2-day crash course on Applied spatial modelling with R (April 13-14, 2016) will be given at the University of Leuven, Belgium: https://lstat.kuleuven.be/training/applied-spatial-modelling-with-r
You'll learn during this course the following elements:

  • The sp package to handle spatial data (spatial points, lines, polygons, spatial data frames)
  • Importing spatial data and setting the spatial projection
  • Plotting spatial data on static and interactive maps
  • Adding graphical components to spatial maps
  • Manipulation of geospatial data, geocoding, distances, …
  • Density estimation, kriging and spatial point pattern analysis
  • Spatial regression

More information: https://lstat.kuleuven.be/training/applied-spatial-modelling-with-r. Registration can be done at https://lstat.kuleuven.be/forms/courses

applied spatial model

New RStudio add-in to schedule R scripts

With the release of RStudio add-in possibilities, a new area of productivity increase and expected new features for R users has arrived. Thanks to the help of Oliver who has written an RStudio add-in on top of taskscheduleR, scheduling and automating an R script from RStudio is now exactly one click away if you are working on Windows.

How? Just install these R packages and you have the add-in ready at the add-in tab in your RStudio session. Select your R script and schedule it to run any time you want. Hope this saves you some day-to-day time and feel free to help make additional improvements. More information: https://github.com/bnosac/taskscheduleR.

install.packages('data.table')
install.packages('knitr')
install.packages('miniUI')
install.packages('shiny')
install.packages("taskscheduleR", repos = "http://www.datatailor.be/rcube", type = "source")

taskscheduleR rstudioaddin