shell

9 items tagged with "shell"

9 Articles

Articles

HDFS - List Folder Recursively

This code snippet provides one example to list all the folders and files recursively under one HDFS path. The key is to use -R option of the ls sub command. Sample output: !2022082465735-image.png

2022-08-24
Code Snippets & Tips

Start Spark History Server UI

This code snippet provides the simple CLI to start Spark history server service. About Spark History Server Spark History Server can be used to look for historical Spark jobs that completed successfully or failed. By default, Spark execution logs are saved into local temporary folders. You can add configuration items into spark-default.xml to save logs to HDFS. For example, the following configurations ensure the logs are stored into my local Hadoop environment. `` spark.eventLog.enabled true spark.eventLog.dir hdfs://localhost:9000/shared/spark-logs spark.history.fs.logDirectory hdfs://localhost:9000/shared/spark-logs ` !2022082171715-image.png In the code snippet, SPARK_HOME `is the environment variable name that points to the location where you Spark is installed. If this variable is not defined, you can directly specify the full path to the shell script (sbin/start-history-server.sh). History Server URL By default, the URL is http://localhost:18080/http://localhost:18080/ in local environment. You can replace localhost with your server address where the history server is started. Usually it locates in edge servers. The UI looks like the following screenshot: !2022082171913-image.png By clicking the link of each App, you will be able to find the job details for each Spark applications.

2022-08-21
Code Snippets & Tips

Start Hive Beeline CLI

This code snippet provides example to start Hive Beeline CLI in Linux. Beeline is the successor of Hive CLI. In the shell scripts, the environment variable $HIVE_HOME is the home folder of Hive installation in the system. In a cluster environment, it usually refers to the Hive client installation on an edge server. Output: `` $HIVE_HOME/bin/beeline -u jdbc:hive2:// Connecting to jdbc:hive2:// Hive Session ID = 65a40cd9-02ce-4965-93b6-cff9db461b70 Connected to: Apache Hive (version 3.1.3) Driver: Hive JDBC (version 3.1.3) Transaction isolation: TRANSACTIONREPEATABLEREAD Beeline version 3.1.3 by Apache Hive 0: jdbc:hive2://> ``

2022-08-20
Code Snippets & Tips

Export CSV File from Azure SQL Databases

2021-08-12
Microsoft Azure

gzip Compress a directory (folder)

2021-08-12
Code Snippets & Tips

Kafka Windows CLI Commands

2020-09-07
Streaming Analytics & Kafka

How to Kill Running Jobs in Hadoop

The following code snippet shows how to list and kill Hadoop jobs including (MapReduce and YARN jobs).

2019-11-18
Code Snippets & Tips

List Hadoop running jobs

Hadoop provides a number of CLIs. hadoop job command can be used to retrieve running job list. You can also use YARN resource manager UI to view the jobs too.

2019-11-18
Code Snippets & Tips

Check HDFS folder size in Shell / Hadoop

Hadoop provides a number of CLIs that can be used to perform many tasks/activities. This code snippet shows you how to check file/folder size in HDFS.

2019-11-18
Code Snippets & Tips