streaming machine learning with RMOA: stream_in > train > predict

We will be showcasing our RMOA package at the next R User conference in Aalborg.
For the R users who are unfamiliar with streaming modelling and want to be ahead of the Gartner Hype cycle or want to evaluate existing streaming machine learning models, RMOA allows to build, run and evaluate streaming classification models which are built in MOA (Massive Online Learning).
For an introduction to RMOA and MOA and the type of machine learning models which are possible in MOA - see our previous blog post or scroll through our blog page.

In this example below, we showcase the RMOA package by using streaming JSON data which can come from whatever noSQL database that spits out json. For this example, package jsonlite provides a nice stream_in function (an example is shown here) which handles streaming json data. Plugging in streaming machine learning models with RMOA is a breeze.

datastream

Let's dive into the R code immediately where we show how to run, build and evaluate a streaming boosted classification model.

require(jsonlite)
require(data.table)
require(RMOA)
require(ROCR)
##
## Use a dataset from Jeroen Ooms available at jeroenooms.github.io/data/diamonds.json
##
myjsondataset <- url("http://jeroenooms.github.io/data/diamonds.json")
datatransfo <- function(x){
  ## Setting the target to predict
  x$target <- factor(ifelse(x$cut == "Very Good", "Very Good", "Other"), levels = c("Very Good", "Other"))
  ## Making sure the levels are the same across all streaming chunks
  x$color <- factor(x$color, levels = c("D", "E", "F", "G", "H", "I", "J"))
  x  
}

##
## Read 100 lines of an example dataset to see how it looks like
##
x <- readLines(myjsondataset, n = 100, encoding = "UTF-8")
x <- rbindlist(lapply(x, fromJSON))
x <- datatransfo(x)
str(x)

######################################
## Boosted streaming classification
##   - set up the boosting options
######################################
ctrl <- MOAoptions(model = "OCBoost", randomSeed = 123456789, ensembleSize = 25,
                   smoothingParameter = 0.5)
mymodel <- OCBoost(control = ctrl)
mymodel
## Train an initial model on 100 rows of the data
myboostedclassifier <- trainMOA(model = mymodel, 
         formula = target ~ color + depth + x + y + z,
         data = datastream_dataframe(x))

## Update the model iteratively with streaming data
stream_in(
  con = myjsondataset,
  handler = function(x){
    x <- datatransfo(x)
    ## Update the trained model with the new chunks
    myboostedclassifier <- trainMOA(model = myboostedclassifier$model, 
             formula = target ~ color + depth + x + y + z,
             data = datastream_dataframe(x), 
             reset = FALSE) ## do not reset what the model has learned already
  },
  pagesize = 500)

## Do some prediction to test the model
predict(myboostedclassifier, x)
table(sprintf("Reality: %s", x$target),
      sprintf("Predicted: %s", predict(myboostedclassifier, x)))

## Do a streaming prediction
stream_in(con = myjsondataset,
          handler = function(x){
            x <- datatransfo(x)
            myprediction <- predict(myboostedclassifier, x)
            ## Basic evaluation by extracting accuracy
            print(round(sum(myprediction == x$target) / length(myprediction), 2))
          },
          pagesize = 100)

For more information on RMOA or streaming modelling, get into contact.

Using R in Robotics applications with ROS

For those of you who are interested in using R alongside Robotics applications and want to use ROS (Robot Operating System) together with R.
Enjoy the slides of our presentation on this topic during the last RBelgium meetup.

If you are interested in applying real-time analysis of your data streams or sensor data with R and ROS, we can help - get in touch at index.php/contact/get-in-touch

Host a CRAN mirror using Docker

CRAN mirrors are the backbone to everyday common R usage. They provide the R website and most of the R packages today. Currently there are about 104 official CRAN mirrors. Hosting a CRAN mirror is one step to help the R community and is explained here.

cran mirror docker

To ease that process, at BNOSAC, we have created a Docker image which sets up a CRAN mirror.
That Docker image can be found and is available for download at the following docker registry: https://registry.hub.docker.com/u/bnosac/cran-mirror

For people who don't know Docker, it is basically a tool which allows developers to containerise an application. In this case, the application is to run a CRAN mirror.
How does it work. 3 steps:

1. Install Docker on your computer or server as explained here, if you haven't done this already.
2. Pull the docker image: docker pull bnosac/cran-mirror
3. Run the CRAN mirror: docker run -p 22:22 -p 80:80 -v /home/bnosac/CRAN/:/var/www/html -d bnosac/cran-mirror

That's it! It started synching CRAN on /home/bnosac/CRAN and will synch every day at 02h30 UTC. You can now go to 0.0.0.0 in your browser or find the ip address where it is running and go to that address in your browser to see the R website (see https://registry.hub.docker.com/u/bnosac/cran-mirror for more info).

Now what can you do with it?

  • have a local CRAN mirror in your company
  • install.packages("data.table", repos = "mylocalmirror")
  • serve the community with another mirror if a closeby mirror is not readily available