Working with Databases and Files in R

access_time 29 days ago visibility10 comment 0

R provides rich APIs to interact with source data such as databases and files (CSV, XML, JSON, etc.) With SparklyR, R can also be used to interact with big data platforms like Hadoop. This articles shows examples about using R to load data from relational databases and text files.  

Data interface summary

The following table is a summary of R packages for accessing data in different sources.

Data SourceLibrary/PackageFunctions
CSVutilsread.table, read.csv, read.csv2, read.delim, read.delim2 write.table, write.csv, write.csv2
Excelxlsxread.xlsx
BinarybasewriteBin(object, con) readBin(con, what, n )
XMLXMLxmlParse, xmlRoot, xmlSize, xmlToDataFrame …
JSONrjsonfromJSON, as.data.frame
Web resourcesRcurl, stringrgetHTMLLinks, links, l_ply
DatabaseRMySQL, RSQLServer, ROracle, RSQLite…dbConnect, dbListTables, dbSendQuery, fetch

Load CSV file

The following example performs these actions in sequence:

  • Load CSV file
  • Perform calculations/aggregations
  • Save the result into CSV file

The input CSV file 'SalesData.csv' has the following content:

Month,Product,Quantities,Amount
2017-07,Vitamin A,100,1700
2017-07,Vitamin E,50,900
2017-08,Vitamin A,100,1700
2017-09,Vitamin A,90,1500
2017-09,Vitamin B,500,3123

Code snippet

The following code is the implementation (script R24.WorkWithCSV.R):

# Load CSV
getwd()

require(utils)

(
  salesData <- read.csv("R24/SalesData.csv", header = TRUE, sep=",")
)

require(reshape)

# Melt data first
(
  moltenSalesData <- melt(salesData, id=c("Month"), measure=c("Quantities","Amount"))
)

# aggregation by month
(
aggSalesData  <- cast(moltenSalesData,  Month ~ variable, sum, )
)

# Set factor labels for display
aggSalesData$Month <- factor(aggSalesData$Month, labels = c('2017-JUL','2017-AUG','2017-SEP'))
aggSalesData

# Save data
write.csv(aggSalesData,file="R24/AggSalesData.csv", quote=FALSE, row.names = FALSE)

Sample output

After the above script is executed, a new file named 'AggSalesData.csv' is generated with the following content:

'

Connect to SQL Server database via ODBC

There are many packages available in R to connect to databases. The section provides one example of interacting with SQL Server database via 'odbc' package. 

For interacting with databases, these DBI functions are provided:

  • dbConnect
  • dbGetInfo
  • dbListTables
  • dbReadTable
  • dbSendStatement
  • dbHasCompleted
  • dbGetRowsAffected
  • dbClearResult
  • dbSendQuery
  • dbFetch
  • dbDisconnect
  • ...

Code snippet

The following code snippet (script R25.Database.R) install odbc package and then use it to establish a database connection; once the database connection is established, it is then used to read data from table 'SalesData'. Insert statements are also executed to insert data into the database. The script also shows different ways to loading data from database as data frame. 

if("odbc" %in% installed.packages())
{
  print("odbc is installed.")
}else
{
  print("odbc is not installed.")
  install.packages("odbc")
}

require(odbc)

# establish connection
con <- dbConnect(odbc::odbc(), 
                 .connection_string = "Driver={SQL Server};Server=.;Database=RTutorials;Trusted_Connection=True;")

# get database info
dbGetInfo(con)

# list tables
dbListTables(con)

# read table
dbReadTable(con, "SalesData")

# DML statements
rs <- dbSendStatement(con, "INSERT [dbo].[SalesData] ([Month], [Product], [Quantitites], [Amount]) VALUES (N'2017-09', N'Vitamin C', 500, CAST(3123.00 AS Numeric(20, 2)))")
dbHasCompleted(rs)
dbGetRowsAffected(rs)
dbClearResult(rs)

# Select statements with dbFetch()
rs <- dbSendQuery(con, "SELECT * FROM SalesData;")
dbFetch(rs)
dbClearResult(rs)

# disconnect
dbDisconnect(con)

SQL scripts to create the database

USE [RTutorials]
GO
/****** Object:  Table [dbo].[SalesData]    Script Date: 3/12/2017 5:10:17 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[SalesData]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[SalesData](
	[Month] [varchar](10) NOT NULL,
	[Product] [varchar](50) NOT NULL,
	[Quantitites] [int] NOT NULL,
	[Amount] [numeric](20, 2) NOT NULL
) ON [PRIMARY]
END
GO
INSERT [dbo].[SalesData] ([Month], [Product], [Quantitites], [Amount]) VALUES (N'2017-07', N'Vitamin A', 100, CAST(1700.00 AS Numeric(20, 2)))
GO
INSERT [dbo].[SalesData] ([Month], [Product], [Quantitites], [Amount]) VALUES (N'2017-07', N'Vitamin E', 50, CAST(900.00 AS Numeric(20, 2)))
GO
INSERT [dbo].[SalesData] ([Month], [Product], [Quantitites], [Amount]) VALUES (N'2017-08', N'Vitamin A', 100, CAST(1700.00 AS Numeric(20, 2)))
GO
INSERT [dbo].[SalesData] ([Month], [Product], [Quantitites], [Amount]) VALUES (N'2017-09', N'Vitamin A', 90, CAST(1500.00 AS Numeric(20, 2)))
GO
INSERT [dbo].[SalesData] ([Month], [Product], [Quantitites], [Amount]) VALUES (N'2017-09', N'Vitamin B', 500, CAST(3123.00 AS Numeric(20, 2)))
GO


select * from dbo.SalesData

Summary

It's critical to understand how to interact with different data source formats as they are the input for most of our R processes. Let me know if you have any questions or encounter any issues using these data APIs. 

info Last modified by Raymond at 29 days ago copyright The content on this page is licensed under CC-BY-SA-4.0.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Want to publish your article on Kontext?

Learn more

Kontext Column

Created for everyone to publish data, programming and cloud related articles.
Follow three steps to create your columns.


Learn more arrow_forward

More from Kontext

local_offer r-lang

visibility 9
thumb_up 0
access_time 29 days ago

In this series, we've walked-through R programming basics and advanced data types . This article will focus on R packages and IDEs so that you can program efficiently with R. Let's recap these commonly mentioned R terminologies: Package : An extension of the R base system with code, data and ...

local_offer r-lang

visibility 5
thumb_up 0
access_time 29 days ago

R implements a number of useful data types to support complex analytics and calculations. This articles focus on String, Vector, List, Matrix, Array, Factory and Data Frame. It also shows examples about expanding data frame, for example, add or drop columns for data frames, add rows for data ...

Statistics with R (Part I)

local_offer r-lang

visibility 9
thumb_up 0
access_time 29 days ago

Till now, we've gone through R programming basics, data types, packages and IDEs, data APIs to work with data sources and various plotting functions. Let's now dive into the most important part about statistics and modelling with R. After all, R was created for statistics.  warning  Due ...

About column

Programming with R language - tutorials about R. 

rss_feed Subscribe RSS