Ingest Data into Hadoop HDFS through Jupyter Notebook

access_time 8 months ago visibility997 comment 0

Jupyter notebook service can be started in most of operating system. In the system where Hadoop clients are available, you can also easily ingest data into HDFS (Hadoop Distributed File System) using HDFS CLIs. 

*Python 3 Kernel is used in the following examples.

List files in HDFS

The following command shows how to list files in HDFS.

!hadoop fs -ls /

Output


Ingest file into HDFS

HDFS copyFromLocal option can be used to copy file from local to HDFS. You can also use Python variables in the commands.

local_file_path = "/home/tangr/jupyter-notebooks/csharp-example.ipynb"
!hadoop fs -copyFromLocal $local_file_path /

Output

As shown in the following screenshot, a local file named csharp-example.ipynb was ingested into HDFS root folder: /csharp-example.ipynb.

Other HDFS commands

You can also use other commands in Jupyter notebook. For example, download HDFS file into local storage and then parse or read the file using native functions.

copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Want to publish your article on Kontext?

Learn more

Kontext Column

Created for everyone to publish data, programming and cloud related articles.
Follow three steps to create your columns.


Learn more arrow_forward

More from Kontext

Machine Learning with .NET in Jupyter Notebooks

local_offer plot local_offer machine-learning local_offer jupyter-notebook local_offer C# local_offer dotnetcore

visibility 819
thumb_up 0
access_time 11 months ago

In this article, I'm going to show you how to install Jupyter in Windows and then install .NET kernel for Jupyter notebooks. It also shows a machine learning example using ML.NET. The target audience are .NET developers who want to expand their skills in data engineering and science domain with ...

Pandas DataFrame Plot - Area Chart

local_offer plot local_offer jupyter-notebook local_offer python local_offer pandas local_offer pandas-plot

visibility 172
thumb_up 0
access_time 7 months ago

This article provides examples about plotting area chart using  pandas.DataFrame.plot  or  pandas.core.groupby.DataFrameGroupBy.plot   function. The data I'm going to use is the same as the other article  Pandas DataFrame Plot - Bar Chart . I'm also using Jupyter ...

local_offer linux local_offer hadoop local_offer hdfs local_offer yarn local_offer big-data-on-linux

visibility 1121
thumb_up 0
access_time 3 months ago

This article provides step-by-step guidance to install Hadoop 3.3.0 on Linux such as Debian, Ubuntu, Red Hat, openSUSE, etc.  Hadoop 3.3.0 was released on July 14 2020. It is the first release of Apache Hadoop 3.3 line. There are significant changes compared with Hadoop 3.2.0, such as ...

About column

Code snippets and tips for various programming languages/frameworks.

rss_feed Subscribe RSS