dokstok
Dokstok is an R package which contains utility functions to work more easily with PL/R. PL/R is a procedural language which runs on top of PostgreSQL/GreenPlum/Apache HAWQ.
- based on RPostgreSQL
- based on PL/R
Why
This R package was set up with the following things in mind
- Simplification of database transfer & using copy to speed up transfer data in and out to the database
- Simplify storing any R objects inside a PostgreSQL/Greenplum/HAWQ database. So that
- multiple users can access R objects from 1 place instead of communicating over a shared network drive
- as R objects are stored inside the database, R objects can be backed up by standard database procedures
- R scripts, R code and functions can be launched at the database server from your local R session, leveraging the capacity of the database environment. The results can be returned to your local R session or kept in the database.
- R models which are developped locally can be stored in the database and easily deployed as stored procedures. Functionality exists to make this transition more easy.
- As PL/R functionality works for PostgreSQL, GreenPlum and Apache HAWQ, one can easily scale out.
Current features
- Functions to store any R objects inside the database.
dokstok
,dokstok_pull
,dokstok_ls
,dokstok_rm
- PL/R functions to evaluate R code inside the database directly:
plr_eval
- Easy functions to work with the database:
dbfetch
,dbquery
,db_create_table
,list_databases
,list_schema
,list_tables
- Fast reading and writing using COPY
copy_in
,copy_to
- Easy connection setup:
dokstok_connect
,dokstok_defaultconnection
,dokstok_disconnect
. Set once use everywhere.
Examples
Example on storing objects inside the database
dokstok_connect()
## Save model inside database
mymodel <- lm(Sepal.Length ~ Sepal.Width, data = iris)
obj <- dokstok(mymodel)
obj
## See what is in the database + pull the object back locally
dokstok_ls()
m <- dokstok_pull(obj)
summary(m)
Example on running an R script at the database server
myfun <- function(){
result <- list()
result$pid <- Sys.getpid()
result$liblocs = .libPaths()
result$pkgs = rownames(installed.packages())
result$searchpath = search()
result$env = Sys.getenv()
result$ls = ls(envir = .GlobalEnv)
result
}
plr_eval(FUN=myfun, pull = TRUE)
Setup
- Make sure you install the PL/R extension on the database environment
## E.g. with PostgreSQL
sudo apt-get update
sudo apt-get install postgresql-9.3-plr
- Install the R package locally and on the database server
install.packages(c("DBI", "RPostgreSQL", "digest", "jsonlite", "data.table"))
install.packages("plr.utils", repos = "http://www.datatailor.be/rcube", type = "source")
- Start using the package
Create an environment variable DOKSTOK_CON
with the path to a file where to store credentials
## You can also set the default connection which will be use when connecting without arguments
credentials_file <- tempfile(pattern = "dokstok", tmpdir = getwd(), fileext = "")
Sys.setenv(DOKSTOK_CON = credentials_file)
dokstok_defaultconnection(
dbname = "name_of_the_db",
host = "localhost",
user = "name_of_the_user",
password = "password_of_the_user",
port = 5432)
- Setup PL/R extension from R
The following will CREATE EXTENSION plr, create stored procedures, plr_host, plr_require, plr_as_bytea and plr_eval
plrutils_start(fresh = TRUE)
dokstok_init(table='robjects', schema='rstore')
Support
Need support? Contact BNOSAC: http://www.bnosac.be