Blog posts about Hadoop

hadoop hive

Apache Hive 3.0.0 Installation on Windows 10 Step by Step Guide

7,316   9   about 7 months ago

If you have been following my website, you would know I’ve published a number of articles about installing big data tools/framewo...

View detail
hadoop yarn hdfs

Install Hadoop 3.0.0 in Windows (Single Node)

22,747   30   about 2 years ago

This page summarizes the steps to install Hadoop 3.0.0 in your Windows environment. Reference page: https://wiki.apache.org/hadoop/Hadoop2OnWindows ...

View detail
hadoop linux wsl

Install Hadoop 3.2.0 on Windows 10 using Windows Subsystem for Linux (WSL)

5,659   16   about 4 months ago

In my previous post , I showed how to configure a single node Hadoop instance on Windows 10. The steps are not too difficult to follow if you have Java programming backgr...

View detail
spark hadoop pyspark oozie hue

Run Multiple Python Scripts PySpark Application with yarn-cluster Mode

288   0   about 2 months ago

When submitting Spark applications to YARN cluster, two deploy modes can be used: client and cluster. For client mode (default), Spark driver runs on the machine that the Spark application was submitted while for cluster mode, the driver runs on a random node in a cluster. On this page, I am goin...

View detail
hadoop hive wsl

Apache Hive 3.1.1 Installation on Windows 10 using Windows Subsystem for Linux

1,253   2   about 6 months ago

Previously, I demonstrated how to configured Apache Hive 3.0.0 on Windows 10. Apache Hive 3.0.0 Installation on Windows 10 Step by Step Guide...

View detail
spark hadoop yarn oozie

Diagnostics: Container is running beyond physical memory limits

273   0   about 4 months ago

Scenario Recently I created an Oozie workflow which contains one Spark action. The Spark action master is yarn and deploy mode is cluster. Each time when the job runs about 30 minutes, the application fails with errors like the following: Application applicatio...

View detail
teradata spark pyspark

Load Data from Teradata in Spark (PySpark)

849   0   about 4 months ago

In my article Connect to Teradata database through Python , I demonstrated about how to use Teradata python package or Teradata ODBC driver to connect to Teradata. In this article, I’m going to...

View detail
python spark hadoop pyspark

Read Hadoop Credential in PySpark

267   0   about 4 months ago

In one of my previous articles about Password Security Solution for Sqoop , I mentioned creating credential using hadoop credential command. The credentials are stored in JavaKey...

View detail
zeppelin spark hadoop linux sqoop hive wsl

Big Data Tools on Windows via Windows Subsystem for Linux (WSL)

551   0   about 6 months ago

This page summarizes the installation guides about big data tools on Windows through Windows Subsystem for Linux (WSL). ...

View detail
lite-log hive

HiveServer2 Cannot Connect to Hive Metastore Resolutions/Workarounds

435   0   about 6 months ago

Since Hive 3.x, new authentication feature for HiveServer2 client is added. When starting HiveServer2 service (Hive version 3.0.0), you may encounter errors like: ‘HiveServer2 metastore.RetryingMetaStoreClient: RetryingMetaStoreClient trying reconnect as [username]  (auth:S...

View detail