RMOA package for running streaming classifcation & regression models now at CRAN
Last week, we released the RMOA package at CRAN (http://cran.r-project.org/web/packages/RMOA). It is an R package to allow building streaming classification and regression models on top of MOA.
MOA is the acronym of 'Massive Online Analysis' and it is the most popular open source framework for data stream mining which is being developed at the University of Waikato: http://moa.cms.waikato.ac.nz. Our RMOA package interfaces with MOA version 2014.04 and focusses on building, evaluating and scoring streaming classification & regression models on data streams.
Classification & regression models which are possible through RMOA are:
- Classification trees:
* AdaHoeffdingOptionTree
* ASHoeffdingTree
* DecisionStump
* HoeffdingAdaptiveTree
* HoeffdingOptionTree
* HoeffdingTree
* LimAttHoeffdingTree
* RandomHoeffdingTree
- Bayesian classification:
* NaiveBayes
* NaiveBayesMultinomial
- Active learning classification:
* ActiveClassifier
- Ensemble (meta) classifiers:
* Bagging
+ LeveragingBag
+ OzaBag
+ OzaBagAdwin
+ OzaBagASHT
* Boosting
+ OCBoost
+ OzaBoost
+ OzaBoostAdwin
* Stacking
+ LimAttClassifier
* Other
+ AccuracyUpdatedEnsemble
+ AccuracyWeightedEnsemble
+ ADACC
+ DACC
+ OnlineAccuracyUpdatedEnsemble
+ TemporallyAugmentedClassifier
+ WeightedMajorityAlgorithm
- Regression modelling:
* AMRulesRegressor
* FadingTargetMean
* FIMTDD
* ORTO
* Perceptron
* RandomRules
* SGD (Stochastic Gradient Descent)
* TargetMean
Interfaces are implemented to model data in standard files (csv, txt, delimited), ffdf data (from the ff package), data.frames and matrices.
Documentation of MOA directed towards RMOA users can be found at http://jwijffels.github.io/RMOA
Examples on the use of RMOA can be found in the documentation, on github at https://github.com/jwijffels/RMOA or e.g. by viewing the showcase at http://bnosac.be/index.php/blog/16-rmoa-massive-online-data-stream-classifications-with-r-a-moa
If you need support on building streaming models on top of your large dataset. Get into contact.