Invoke Hadoop WebHDFS APIs in .NET Core

2018-02-26 hadoophdfslite-log

Background

Apache doesn't provide native official .NET APIs for Hadoop HDFS. The HTTP REST API supports the complete FileSystem/FileContext interface for HDFS.

Thus, we could use these web APIs to perform HDFS operations in other programming language like C#.

WebHDFS APIs reference

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

Examples

List files

The following code snippet retrieve the file list in the root directory in my local Hadoop node.:

static void Main(string[] args)          {              WebHdfsListStatusApi();              Console.ReadLine();          }

static void WebHdfsListStatusApi()          {

var protocal = "http";              var host = "127.0.0.1";              var port = 9870;              var hdfsFilePath = "\";              var operation = "LISTSTATUS";              var url = $"{protocal}://{host}:{port}/webhdfs/v1/{hdfsFilePath}?op={operation}";              var request = (HttpWebRequest)WebRequest.Create(url);              var response = (HttpWebResponse)request.GetResponse();              using (StreamReader reader = new StreamReader(response.GetResponseStream()))              {                  var result = reader.ReadToEnd();                  Console.WriteLine(result);              }          }

The output looks like the following screenshot:

/project/visual_csharp/resources/2F9A8828-F896-5F46-8CBD-017B2BB6991D.webp

The following is the output in Postman:

/project/visual_csharp/resources/4C158E2A-F9B2-51C1-822E-EB98F7C901FE.webp

Get file content

Similarly you can also get the content of a file through OPEN operation:

static void Main(string[] args)          {              WebHdfsGetFileContent();              Console.ReadLine();          }

static void WebHdfsGetFileContent()          {

var protocal = "http";              var host = "127.0.0.1";              var port = 9870;              var hdfsFilePath = "\Sales.csv";              var operation = "OPEN";              var url = $"{protocal}://{host}:{port}/webhdfs/v1/{hdfsFilePath}?op={operation}";              var request = (HttpWebRequest)WebRequest.Create(url);              var response = (HttpWebResponse)request.GetResponse();              using (StreamReader reader = new StreamReader(response.GetResponseStream()))              {                  var result = reader.ReadToEnd();                  Console.WriteLine(result);              }          }

The following screenshot is the sample output:

/project/visual_csharp/resources/D6CDC3B7-8250-58C3-953B-42DD98BECF36.webp