dokstok
Dokstok is an R package which contains utility functions to work more easily with PL/R. PL/R is a procedural language which runs on top of PostgreSQL/GreenPlum/Apache HAWQ.
- based on RPostgreSQL
- based on PL/R
Why
This R package was set up with the following things in mind
- Simplification of database transfer & using copy to speed up transfer data in and out to the database
- Simplify storing any R objects inside a PostgreSQL/Greenplum/HAWQ database. So that
- multiple users can access R objects from 1 place instead of communicating over a shared network drive
- as R objects are stored inside the database, R objects can be backed up by standard database procedures
- R scripts, R code and functions can be launched at the database server from your local R session, leveraging the capacity of the database environment. The results can be returned to your local R session or kept in the database.
- R models which are developped locally can be stored in the database and easily deployed as stored procedures. Functionality exists to make this transition more easy.
- As PL/R functionality works for PostgreSQL, GreenPlum and Apache HAWQ, one can easily scale out.
Current features
- Functions to store any R objects inside the database.
dokstok,dokstok_pull,dokstok_ls,dokstok_rm - PL/R functions to evaluate R code inside the database directly:
plr_eval - Easy functions to work with the database:
dbfetch,dbquery,db_create_table,list_databases,list_schema,list_tables - Fast reading and writing using COPY
copy_in,copy_to - Easy connection setup:
dokstok_connect,dokstok_defaultconnection,dokstok_disconnect. Set once use everywhere.
Examples
Example on storing objects inside the database
dokstok_connect()
## Save model inside database
mymodel <- lm(Sepal.Length ~ Sepal.Width, data = iris)
obj <- dokstok(mymodel)
obj
## See what is in the database + pull the object back locally
dokstok_ls()
m <- dokstok_pull(obj)
summary(m)
Example on running an R script at the database server
myfun <- function(){
result <- list()
result$pid <- Sys.getpid()
result$liblocs = .libPaths()
result$pkgs = rownames(installed.packages())
result$searchpath = search()
result$env = Sys.getenv()
result$ls = ls(envir = .GlobalEnv)
result
}
plr_eval(FUN=myfun, pull = TRUE)
Setup
- Make sure you install the PL/R extension on the database environment
## E.g. with PostgreSQL
sudo apt-get update
sudo apt-get install postgresql-9.3-plr
- Install the R package locally and on the database server
install.packages(c("DBI", "RPostgreSQL", "digest", "jsonlite", "data.table"))
install.packages("plr.utils", repos = "http://www.datatailor.be/rcube", type = "source")
- Start using the package
Create an environment variable DOKSTOK_CON with the path to a file where to store credentials
## You can also set the default connection which will be use when connecting without arguments
credentials_file <- tempfile(pattern = "dokstok", tmpdir = getwd(), fileext = "")
Sys.setenv(DOKSTOK_CON = credentials_file)
dokstok_defaultconnection(
dbname = "name_of_the_db",
host = "localhost",
user = "name_of_the_user",
password = "password_of_the_user",
port = 5432)
- Setup PL/R extension from R
The following will CREATE EXTENSION plr, create stored procedures, plr_host, plr_require, plr_as_bytea and plr_eval
plrutils_start(fresh = TRUE)
dokstok_init(table='robjects', schema='rstore')
Support
Need support? Contact BNOSAC: http://www.bnosac.be

