SQL Server Integration Service (SSIS) has tasks to perform operations against Hadoop, for example:
In Data Flow Task, you can also use:
In this page, I’m going to demonstrate how to write file into HDFS through SSIS Hadoop File System Task.
Refer to the following page to install Hadoop if you don’t have one instance to play with.
SSIS can be installed via SQL Server Data Tools (SSDT). In this example, I am using 15.1.
In your SSIS package, create a Hadoop Connection Manager:
In WebHDFS tab of the editor, specify the following details:
Click Test Connection button to ensure you can connect and then click OK:
Create a local CSV file named F:\DataAnalytics\Sales.csv with the following content:
Create a file connection manager Sales.csv which points to the file created above.
Use the two connection managers created above to create a Hadoop File System Task:
In the above settings, it uploads Sales.csv into /Sales.csv in HDFS.
Run the package or execute the task to make sure it is completed successfully:
Use the following command to verify whether the file is uploaded successfully:
hdfs dfs -ls \
You can also print out the content via the following command:
hdfs dfs -cat /Sales.csv
WebHDFS REST API reference
It is very easy to upload files into HDFS through SSIS. You can also upload the whole directory into HDFS through this task if you change the file connection manager to pointing to a folder.
If you have any questions, please let me know.