Context

SQL Server Integration Service (SSIS) has tasks to perform operations against Hadoop, for example:

  • Hadoop File System Task
  • Hadoop Hive Task
  • Hadoop Pig Task

In Data Flow Task, you can also use:

  • Hadoop HDFS Source
  • Hadoop HDFS Destination

In this page, I’m going to demonstrate how to write file into HDFS through SSIS Hadoop File System Task.

References

https://docs.microsoft.com/en-us/sql/integration-services/control-flow/hadoop-file-system-task

Prerequisites

Hadoop

Refer to the following page to install Hadoop if you don’t have one instance to play with.

Install Hadoop 3.0.0 in Windows (Single Node)

SSIS

SSIS can be installed via SQL Server Data Tools (SSDT). In this example, I am using 15.1.

Create Hadoop connection manager

In your SSIS package, create a Hadoop Connection Manager:

image

In WebHDFS tab of the editor, specify the following details:

image

Click Test Connection button to ensure you can connect and then click OK:

image

Create a file connection manager

Create a local CSV file

Create a local CSV file named F:\DataAnalytics\Sales.csv with the following content:

Month,Amount
1/01/2017,30022
1/02/2017,12334
1/03/2017,33455
1/04/2017,50000
1/05/2017,33333
1/06/2017,11344
1/07/2017,12344
1/08/2017,24556
1/09/2017,46667

Create a file connection manager

Create a file connection manager Sales.csv which points to the file created above.

image

Create Hadoop File System Task

Use the two connection managers created above to create a Hadoop File System Task:

image

In the above settings, it uploads Sales.csv into /Sales.csv in HDFS.

Run the package

Run the package or execute the task to make sure it is completed successfully:

image

Verify the result via HDFS CLI

Use the following command to verify whether the file is uploaded successfully:

hdfs dfs -ls \

image

You can also print out the content via the following command:

hdfs dfs -cat /Sales.csv

image

Verify the result through Name Node web UI

image

image

WebHDFS REST API reference

    https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

Summary

It is very easy to upload files into HDFS through SSIS. You can also upload the whole directory into HDFS through this task if you change the file connection manager to pointing to a folder.

If you have any questions, please let me know.

info Last modified by Raymond at 3 years ago * This page is subject to Site terms.

More from Kontext

local_offer hadoop local_offer hive local_offer Java

visibility 446
thumb_up 1
access_time 3 months ago

When I was configuring Hive 3.0.0 in Hadoop 3.2.1 environment, I encountered the following error: Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V Ro...

open_in_new Hadoop

local_offer jupyter-notebook local_offer hdfs

visibility 274
thumb_up 0
access_time 4 months ago

Jupyter notebook service can be started in most of operating system. In the system where Hadoop clients are available, you can also easily ingest data into HDFS (Hadoop Distributed File System) using HDFS CLIs.  *Python 3 Kernel is used in the following examples. List files in H...

open_in_new Code snippets

local_offer hdfs local_offer hadoop local_offer windows

visibility 383
thumb_up 0
access_time 5 months ago

Network Attached Storage are commonly used in many enterprises where files are stored remotely on those servers.  They typically provide access to files using network file sharing protocols such as  ...

open_in_new Hadoop

local_offer hive local_offer hdfs

visibility 126
thumb_up 0
access_time 5 months ago

In Hive, there are two types of tables can be created - internal and external table. Internal tables are also called managed tables. Different features are available to different types. This article lists some of the common differences.  Internal table By default, Hive creates ...

open_in_new Hadoop

info About author

comment Comments (0)

comment Add comment

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

No comments yet.

Dark theme mode

Dark theme mode is available on Kontext.

Learn more arrow_forward

Kontext Column

Created for everyone to publish data, programming and cloud related articles. Follow three steps to create your columns.


Learn more arrow_forward