SQL Server Integration Service (SSIS) has tasks to perform operations against Hadoop, for example:
In Data Flow Task, you can also use:
In this page, I’m going to demonstrate how to write file into HDFS through SSIS Hadoop File System Task.
Refer to the following page to install Hadoop if you don’t have one instance to play with.
SSIS can be installed via SQL Server Data Tools (SSDT). In this example, I am using 15.1.
In your SSIS package, create a Hadoop Connection Manager:
In WebHDFS tab of the editor, specify the following details:
Click Test Connection button to ensure you can connect and then click OK:
Create a local CSV file named F:\DataAnalytics\Sales.csv with the following content:
Create a file connection manager Sales.csv which points to the file created above.
Use the two connection managers created above to create a Hadoop File System Task:
In the above settings, it uploads Sales.csv into /Sales.csv in HDFS.
Run the package or execute the task to make sure it is completed successfully:
Use the following command to verify whether the file is uploaded successfully:
hdfs dfs -ls \
You can also print out the content via the following command:
hdfs dfs -cat /Sales.csv
WebHDFS REST API reference
It is very easy to upload files into HDFS through SSIS. You can also upload the whole directory into HDFS through this task if you change the file connection manager to pointing to a folder.
If you have any questions, please let me know.
In Sqoop, there are multiple approaches to pass in passwords for RDBMS. Options Option 1 - clear password through --password argument sqoop [subcommand] --username user --password pwd This is the weakest approach as password is exposed directly...View detail
In Safe Mode, the HDFS cluster is read-only. After completion of block replication maintenance activity, the name node leaves safe mode automatically. If you try to delete files in safe mode, the following exception may raise: org.apache.hadoop.ipc.RemoteException(org.apac...View detail
When configuring YARN and MapReduce in Hadoop cluster, it is very important to configure the memory and virtual processors correctly. If the configurations are incorrect, the nodes may not be able to start properly and the applications may not be able to run successfully. For example...View detail