SQL Server Integration Service (SSIS) has tasks to perform operations against Hadoop, for example:
In Data Flow Task, you can also use:
In this page, I’m going to demonstrate how to write file into HDFS through SSIS Hadoop File System Task.
Refer to the following page to install Hadoop if you don’t have one instance to play with.
SSIS can be installed via SQL Server Data Tools (SSDT). In this example, I am using 15.1.
In your SSIS package, create a Hadoop Connection Manager:
In WebHDFS tab of the editor, specify the following details:
Click Test Connection button to ensure you can connect and then click OK:
Create a local CSV file named F:\DataAnalytics\Sales.csv with the following content:
Create a file connection manager Sales.csv which points to the file created above.
Use the two connection managers created above to create a Hadoop File System Task:
In the above settings, it uploads Sales.csv into /Sales.csv in HDFS.
Run the package or execute the task to make sure it is completed successfully:
Use the following command to verify whether the file is uploaded successfully:
hdfs dfs -ls \
You can also print out the content via the following command:
hdfs dfs -cat /Sales.csv
WebHDFS REST API reference
It is very easy to upload files into HDFS through SSIS. You can also upload the whole directory into HDFS through this task if you change the file connection manager to pointing to a folder.
If you have any questions, please let me know.
In Safe Mode, the HDFS cluster is read-only. After completion of block replication maintenance activity, the name node leaves safe mode automatically. If you try to delete files in safe mode, the following exception may raise: org.apache.hadoop.ipc.RemoteException(org.apac...View detail
When configuring YARN and MapReduce in Hadoop cluster, it is very important to configure the memory and virtual processors correctly. If the configurations are incorrect, the nodes may not be able to start properly and the applications may not be able to run successfully. For example...View detail
Are you a Windows/.NET developer and willing to learn big data concepts and tools in your Windows? If yes, you can follow the links below to install them in your PC. The installations are usually easier to do in Linux/UNIX but they are not difficult to implement in Windows either since the...View detail
This page summarizes the default ports used by Hadoop services. It is useful when configuring network interfaces in a cluster. Hadoop 3.1.0 HDFS The secondary namenode http/https server address and port. ...View detail