Working with Databases and Files in R

access_time 5 months ago visibility18 comment 0

R provides rich APIs to interact with source data such as databases and files (CSV, XML, JSON, etc.) With SparklyR, R can also be used to interact with big data platforms like Hadoop. This articles shows examples about using R to load data from relational databases and text files.  

Data interface summary

The following table is a summary of R packages for accessing data in different sources.

Data SourceLibrary/PackageFunctions
CSVutilsread.table, read.csv, read.csv2, read.delim, read.delim2 write.table, write.csv, write.csv2
BinarybasewriteBin(object, con) readBin(con, what, n )
XMLXMLxmlParse, xmlRoot, xmlSize, xmlToDataFrame …
Web resourcesRcurl, stringrgetHTMLLinks, links, l_ply
DatabaseRMySQL, RSQLServer, ROracle, RSQLite…dbConnect, dbListTables, dbSendQuery, fetch

Load CSV file

The following example performs these actions in sequence:

  • Load CSV file
  • Perform calculations/aggregations
  • Save the result into CSV file

The input CSV file 'SalesData.csv' has the following content:

2017-07,Vitamin A,100,1700
2017-07,Vitamin E,50,900
2017-08,Vitamin A,100,1700
2017-09,Vitamin A,90,1500
2017-09,Vitamin B,500,3123

Code snippet

The following code is the implementation (script R24.WorkWithCSV.R):

# Load CSV


  salesData <- read.csv("R24/SalesData.csv", header = TRUE, sep=",")


# Melt data first
  moltenSalesData <- melt(salesData, id=c("Month"), measure=c("Quantities","Amount"))

# aggregation by month
aggSalesData  <- cast(moltenSalesData,  Month ~ variable, sum, )

# Set factor labels for display
aggSalesData$Month <- factor(aggSalesData$Month, labels = c('2017-JUL','2017-AUG','2017-SEP'))

# Save data
write.csv(aggSalesData,file="R24/AggSalesData.csv", quote=FALSE, row.names = FALSE)

Sample output

After the above script is executed, a new file named 'AggSalesData.csv' is generated with the following content:


Connect to SQL Server database via ODBC

There are many packages available in R to connect to databases. The section provides one example of interacting with SQL Server database via 'odbc' package. 

For interacting with databases, these DBI functions are provided:

  • dbConnect
  • dbGetInfo
  • dbListTables
  • dbReadTable
  • dbSendStatement
  • dbHasCompleted
  • dbGetRowsAffected
  • dbClearResult
  • dbSendQuery
  • dbFetch
  • dbDisconnect
  • ...

Code snippet

The following code snippet (script R25.Database.R) install odbc package and then use it to establish a database connection; once the database connection is established, it is then used to read data from table 'SalesData'. Insert statements are also executed to insert data into the database. The script also shows different ways to loading data from database as data frame. 

if("odbc" %in% installed.packages())
  print("odbc is installed.")
  print("odbc is not installed.")


# establish connection
con <- dbConnect(odbc::odbc(), 
                 .connection_string = "Driver={SQL Server};Server=.;Database=RTutorials;Trusted_Connection=True;")

# get database info

# list tables

# read table
dbReadTable(con, "SalesData")

# DML statements
rs <- dbSendStatement(con, "INSERT [dbo].[SalesData] ([Month], [Product], [Quantitites], [Amount]) VALUES (N'2017-09', N'Vitamin C', 500, CAST(3123.00 AS Numeric(20, 2)))")

# Select statements with dbFetch()
rs <- dbSendQuery(con, "SELECT * FROM SalesData;")

# disconnect

SQL scripts to create the database

USE [RTutorials]
/****** Object:  Table [dbo].[SalesData]    Script Date: 3/12/2017 5:10:17 PM ******/
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[SalesData]') AND type in (N'U'))
CREATE TABLE [dbo].[SalesData](
	[Month] [varchar](10) NOT NULL,
	[Product] [varchar](50) NOT NULL,
	[Quantitites] [int] NOT NULL,
	[Amount] [numeric](20, 2) NOT NULL
INSERT [dbo].[SalesData] ([Month], [Product], [Quantitites], [Amount]) VALUES (N'2017-07', N'Vitamin A', 100, CAST(1700.00 AS Numeric(20, 2)))
INSERT [dbo].[SalesData] ([Month], [Product], [Quantitites], [Amount]) VALUES (N'2017-07', N'Vitamin E', 50, CAST(900.00 AS Numeric(20, 2)))
INSERT [dbo].[SalesData] ([Month], [Product], [Quantitites], [Amount]) VALUES (N'2017-08', N'Vitamin A', 100, CAST(1700.00 AS Numeric(20, 2)))
INSERT [dbo].[SalesData] ([Month], [Product], [Quantitites], [Amount]) VALUES (N'2017-09', N'Vitamin A', 90, CAST(1500.00 AS Numeric(20, 2)))
INSERT [dbo].[SalesData] ([Month], [Product], [Quantitites], [Amount]) VALUES (N'2017-09', N'Vitamin B', 500, CAST(3123.00 AS Numeric(20, 2)))

select * from dbo.SalesData


It's critical to understand how to interact with different data source formats as they are the input for most of our R processes. Let me know if you have any questions or encounter any issues using these data APIs. 

info Last modified by Raymond 5 months ago copyright The content on this page is licensed under CC-BY-SA-4.0.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Follow Kontext

Get our latest updates on LinkedIn or Twitter.

Want to publish your article on Kontext?

Learn more

More from Kontext

visibility 30
thumb_up 0
access_time 5 months ago

This article provides a basic introduction about programming with R incl. atomic vector, variable, operations, branching, loops and functions.  info All examples can run RStudio or R Tools for Visual Studio on Windows.  About these two IDEs, refer to R Introduction . We always start ...

Statistics with R (Part I)
visibility 15
thumb_up 0
access_time 5 months ago

Till now, we've gone through R programming basics, data types, packages and IDEs, data APIs to work with data sources and various plotting functions. Let's now dive into the most important part about statistics and modelling with R. After all, R was created for statistics.  warning  Due ...

visibility 16
thumb_up 0
access_time 5 months ago

In many scenarios, we need to generate data directly in memory. This article provides examples about generating regular and random sequences with R. It also shows you how to reshape or restructure data.  In the preceding articles, we already used a quite a few functions to generate regular ...