Last week, we released the RMOA package at CRAN (http://cran.r-project.org/web/packages/RMOA). It is an R package to allow building streaming classification and regression models on top of MOA.

MOA is the acronym of 'Massive Online Analysis' and it is the most popular open source framework for data stream mining which is being developed at the University of Waikato: http://moa.cms.waikato.ac.nz. Our RMOA package interfaces with MOA version 2014.04 and focusses on building, evaluating and scoring streaming classification & regression models on data streams.

Classification & regression models which are possible through RMOA are:

- Classification trees:

* AdaHoeffdingOptionTree

* ASHoeffdingTree

* DecisionStump

* HoeffdingAdaptiveTree

* HoeffdingOptionTree

* HoeffdingTree

* LimAttHoeffdingTree

* RandomHoeffdingTree

- Bayesian classification:

* NaiveBayes

* NaiveBayesMultinomial

- Active learning classification:

* ActiveClassifier

- Ensemble (meta) classifiers:

* Bagging

+ LeveragingBag

+ OzaBag

+ OzaBagAdwin

+ OzaBagASHT

* Boosting

+ OCBoost

+ OzaBoost

+ OzaBoostAdwin

* Stacking

+ LimAttClassifier

* Other

+ AccuracyUpdatedEnsemble

+ AccuracyWeightedEnsemble

+ ADACC

+ DACC

+ OnlineAccuracyUpdatedEnsemble

+ TemporallyAugmentedClassifier

+ WeightedMajorityAlgorithm

- Regression modelling:

* AMRulesRegressor

* FadingTargetMean

* FIMTDD

* ORTO

* Perceptron

* RandomRules

* SGD (Stochastic Gradient Descent)

* TargetMean

Interfaces are implemented to model data in standard files (csv, txt, delimited), ffdf data (from the ff package), data.frames and matrices.

Documentation of MOA directed towards RMOA users can be found at http://jwijffels.github.io/RMOA

Examples on the use of RMOA can be found in the documentation, on github at https://github.com/jwijffels/RMOA or e.g. by viewing the showcase at http://bnosac.be/index.php/blog/16-rmoa-massive-online-data-stream-classifications-with-r-a-moa

If you need support on building streaming models on top of your large dataset. Get into contact.

For those of you who don't know MOA. MOA stands for

**M**assive

**O**n-line

**A**nalysis and is an open-source framework that allows to build and run experiments of machine learning or data mining on evolving data streams. The website of MOA (

http://moa.cms.waikato.ac.nz) indicates it contains machine learning algorithms for

**classification, regression, clustering, outlier detection and recommendation engines**.

For R users who work with a lot of data or encounter RAM issues when building models on large datasets, MOA and in general data streams have some nice features. Namely:

- It uses a
*limited amount of memory. *So this means no RAM issues when building models*.*
- Processes one example at a time, and will run over it only once
- Works incrementally - so that a model is
*directly ready* to be used for prediction purposes

Unfortunately it is written in Java and not easily accessible for R users to use. For users mostly interested in clustering, the

stream package already facilites this (

this blog item gave an example when using

ff alongside the stream package). In our day-to-day use cases, classification is a more common request. The stream package only allows to do clustering. So hence the decision to make the

**classification algorithms of MOA easily available to R users as well**. For this the

**RMOA package** was created and is available on github (

https://github.com/jwijffels/RMOA).

The current features of RMOA are:

- Easy to set up data streams on data in RAM (data.frame/matrix), data in files (csv, delimited, flat table) as well as out-of memory data in an ffdf (ff package).
- Easy to set up a MOA classification model
- There are
**26 classification models** available which range from
- Classification Trees (AdaHoeffdingOptionTree, ASHoeffdingTree, DecisionStump, HoeffdingAdaptiveTree, HoeffdingOptionTree, HoeffdingTree, LimAttHoeffdingTree, RandomHoeffdingTree)
- Bayes Rule (NaiveBayes, NaiveBayesMultinomial)
- Ensemble learning
- Bagging (LeveragingBag, OzaBag, OzaBagAdwin, OzaBagASHT)
- Boosting (OCBoost, OzaBoost, OzaBoostAdwin)
- Stacking (LimAttClassifier)
- Other (AccuracyUpdatedEnsemble, AccuracyWeightedEnsemble, ADACC, DACC, OnlineAccuracyUpdatedEnsemble, TemporallyAugmentedClassifier, WeightedMajorityAlgorithm)

- Active learning (ActiveClassifier)

- Easy R-familiar interface to train the model on streaming data with a familiar formula interface as in
`trainMOA(model, formula, data, subset, na.action = na.exclude, ...)`

- Easy to predict new data alongside the model as in
`predict(object, newdata, type = "response", ...)`

An example of R code which constructs a HoeffdingTree and a boosted set of HoeffdingTrees is shown below.

##
## Installation from github
##
library(devtools)
install.packages("ff")
install.packages("rJava")
install_github("jwijffels/RMOA", subdir="RMOAjars/pkg")
install_github("jwijffels/RMOA", subdir="RMOA/pkg")
##
## HoeffdingTree example
##
require(RMOA)
hdt <- HoeffdingTree(numericEstimator = "GaussianNumericAttributeClassObserver")
hdt
## Define a stream - e.g. a stream based on a data.frame
data(iris)
iris <- factorise(iris)
irisdatastream <- datastream_dataframe(data=iris)
## Train the HoeffdingTree on the iris dataset
mymodel <- trainMOA(model = hdt,
formula = Species ~ Sepal.Length + Sepal.Width + Petal.Length,
data = irisdatastream)
## Predict using the HoeffdingTree on the iris dataset
scores <- predict(mymodel, newdata=iris, type="response")
table(scores, iris$Species)
scores <- predict(mymodel, newdata=iris, type="votes")
head(scores)
##
## Boosted set of HoeffdingTrees
##
irisdatastream <- datastream_dataframe(data=iris)
mymodel <- OzaBoost(baseLearner = "trees.HoeffdingTree", ensembleSize = 30)
mymodel <- trainMOA(model = mymodel,
formula = Species ~ Sepal.Length + Sepal.Width + Petal.Length,
data = irisdatastream)
## Predict
scores <- predict(mymodel, newdata=iris, type="response")
table(scores, iris$Species)
scores <- predict(mymodel, newdata=iris, type="votes")
head(scores)

Within 2 weeks on Thursday, March 20, The RBelgium R user group is holding its next Regular meeting in Brussels for which this is the schedule:

**** Analysis and visualisation of climate data from the atmospheric model ALADIN using the Rfa package! **(Rozemien De Troch - Onderzoeksdepartement KMI)

**** Probabilistic latent feature analysis with the plfm package **(Michel Meulders - Centre for Information Management, Modeling and Simulation, KU Leuven@ HUBrussel)

**** AiR Quality Monitoring – An alternative way for data analysis and visualization **(Spanu Laurent & Lenartz Fabian - Institut scientifique de service public)

For more information about the event follow this link. Feel free to join.

Advanced R programming topics

Similarly as last year, BNOSAC is offering the short course on 'Advanced R programming topics' at the Leuven Statistics Research Center (Belgium).

The course is now part of FLAMES (Flanders Training Network for Methodology and Statistics) and can be found here http://www.flames-statistics.eu/training/advanced-r-programming-topics. Subscription is no longer possible unless you ask kindly to LStat.

RApache and developing web applications with R as backend

As the demand of courses on R is increasing, we are thinking also about giving a **course on RApache and developing web applications with R as a backend**. This course will allow you to build applications like this one http://rweb.stat.ucla.edu/lme4/ or this one http://rweb.stat.ucla.edu/ggplot2/.

BNOSAC has quite some (private) business applications running involving this technology stack and would to share with you it's knowledge. If you are interested in these courses which combine javascript, R and RApache, get in contact with us and send a mail by filling out the form at index.php/contact/get-in-touch. The more people interested, the lower the cost of the course ... .