read.odbc.ffdf & read.dbi.ffdf for fetching large corporate SQL data

If you are into large data but not enormeoulsy big data everyone is talking about and you are tired of finding a solution to get your data with several 10's of millions of records in R without having RAM issues, having a look at the packages ff, ffbase and ETLUtils might be the solution to your problem.

Following up on our post about the ETLUtils package which eases transferring large data from SQL databases to ffdf objects in R, the ETLUtils package has now been extended to include the function read.odbc.ffdf which can be used to fetch your SQL queries on corporate Oracle, MySQL, PostgreSQL & sqlite databases. 
Below we show an example where read.dbi.ffdf is used to fetch all rows of a table and we add data of the same structure with read.odbc.ffdf to the existing ffdf. This might be of interest to you if you work a lot with dayly incremental data updates.
The query below returned +/- 15Mio records using read.dbi.ffdf without any RAM issues (on this PC I have 4Gb of RAM) and added another 100000 records as an example using read.odbc.ffdf. And all of the data is completely in an ffdf in R.
login <- list()
login$user <- "bnosac"
login$password <- "YourPassword"
login$dbname <- "YourDB"
login$host <- "localhost/IPaddress"
x <- read.dbi.ffdf(
query = "select * from semetis.keywords_performance_endofday", dbConnect.args = list(drv = dbDriver("MySQL"), dbname = login$dbname, user = login$user, password = login$password, host = login$host),
1> dim(x)
[1] 14969674       27
login <- list()
login$dsn <- "YourDSN"
login$uid <- "bnosac"
login$pwd <- "YourPassword"
x <- read.odbc.ffdf(
query = "select * from semetis.keywords_performance_endofday where date = CURRENT_DATE-1", odbcConnect.args = list(dsn = login$dsn, uid = login$uid, pwd = login$pwd),
x = x,
1> dim(x)
[1] 15062904       27

Get your large SQL data in ff swiftly

The ff package is great when you are working with large data in R. Data in corporate environments are usually not that large that a Hadoop system is needed to handle it but the data are mostly large enough to make R choke on it's RAM. 

The ff package is great for this type of data. It can handle 2.14 billion elements per atomic (so 2.14 billion records). This is mostly more than enough in standard business settings nowadays.

For a while getting the data in R was quite cumbersome. The ff package now provides already nice interfaces to csv and flat files in the read.csv.ffdf and read.table.ffdf functions. To extend this towards standard SQL databases, the ETLUtils package has been put at CRAN. It contains a function called read.dbi.ffdf which can be used to extract easily data from any sql database through DBI directly into an ffdf in R.

Below, we show an example of importing a table stored at a MySQL database with 11.8 million records directly in an ffdf without having RAM issues.


R package ETLUtils @ CRAN - easy loading into ffdf

The R package ETLUtils is now available for download at it's CRAN repository.
It's a package which facilitates the ETL in situations where you need to interact with SQL databases in a corporate environment. Basically it currently focusses on the E(Extract) part of the ETL.

In the libary you'll find a function called read.dbi.ffdf which allows to extract easily large datasets from an SQL database through DBI straight into a ffdf (part of the ff package). This should ease the usage of the ff package for working with large datasets in corporate environments where you need to interact e.g. with MySQL, PostgreSQL, Oracle through the package RMySQL, RPostgreSQL or ROracle without having to worry about RAM issues when importing large datasets.