Ingest Data into Hadoop HDFS through Jupyter Notebook

access_time 10 months ago visibility1487 comment 0

Jupyter notebook service can be started in most of operating system. In the system where Hadoop clients are available, you can also easily ingest data into HDFS (Hadoop Distributed File System) using HDFS CLIs. 

*Python 3 Kernel is used in the following examples.

List files in HDFS

The following command shows how to list files in HDFS.

!hadoop fs -ls /

Output


Ingest file into HDFS

HDFS copyFromLocal option can be used to copy file from local to HDFS. You can also use Python variables in the commands.

local_file_path = "/home/tangr/jupyter-notebooks/csharp-example.ipynb"
!hadoop fs -copyFromLocal $local_file_path /

Output

As shown in the following screenshot, a local file named csharp-example.ipynb was ingested into HDFS root folder: /csharp-example.ipynb.

Other HDFS commands

You can also use other commands in Jupyter notebook. For example, download HDFS file into local storage and then parse or read the file using native functions.

copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Follow Kontext

Get our latest updates on LinkedIn or Twitter.

Want to publish your article on Kontext?

Learn more

More from Kontext

visibility 1884
thumb_up 0
access_time 2 years ago

Use the following command: hadoop fs [-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>] For example, copy a file from /hdfs-file.txt in HDFS to local /tmp/ using the following command: hadoop fs -copyToLocal /hdfs-file.txt /tmp/hdfs-file.txt If you forgot any HDFS ...

Fix for Hadoop 3.2.1 namenode format issue on Windows 10
visibility 2794
thumb_up 2
access_time 13 months ago

When installing Hadoop 3.2.1 on Windows 10,  you may encounter the following error when trying to format HDFS  namnode: ERROR namenode.NameNode: Failed to start namenode. The error happens when running the following command in Command Prompt: hdfs namenode -format 2020-01-18 ...

Pandas DataFrame Plot - Area Chart
visibility 296
thumb_up 0
access_time 10 months ago

This article provides examples about plotting area chart using  pandas.DataFrame.plot  or  pandas.core.groupby.DataFrameGroupBy.plot   function. The data I'm going to use is the same as the other article  Pandas DataFrame Plot - Bar Chart . I'm also using Jupyter ...