Analytics & BI

Data Analytics,Big Data,Data Storage and Business Intelligence.

Subscribe

hadoop hive

Apache Hive 3.0.0 Installation on Windows 10 Step by Step Guide

7,195   9   about 7 months ago

If you have been following my website, you would know I’ve published a number of articles about installing big data tools/framewo...

View detail
hadoop yarn hdfs

Install Hadoop 3.0.0 in Windows (Single Node)

22,647   30   about 2 years ago

This page summarizes the steps to install Hadoop 3.0.0 in your Windows environment. Reference page: https://wiki.apache.org/hadoop/Hadoop2OnWindows ...

View detail
sql server zeppelin

Connecting Apache Zeppelin to your SQL Server

1,573   8   about 2 years ago

This page demonstrates the steps you need to connect to SQL Server in Zeppelin. There are many ways to implement this, for example SQL Server interpreters in GitHub. In this page, I am going to use the JDBC driver to connect to SQL Server instead of using third party interpreters. For authe...

View detail
power-bi bigquery

Use Google Cloud BigQuery as Data Source in Power BI

2,961   3   about 2 years ago

BigQuery is Google’s serverless data warehouse in Google Cloud. Power BI can consume data from various sources including RDBMS, NoSQL, Could, Services, etc. It is also easy to get data from BigQuery in Power BI. In this article, I am going to demonstrate how to connect to BigQuery to create...

View detail
hadoop linux wsl

Install Hadoop 3.2.0 on Windows 10 using Windows Subsystem for Linux (WSL)

5,469   16   about 4 months ago

In my previous post , I showed how to configure a single node Hadoop instance on Windows 10. The steps are not too difficult to follow if you have Java programming backgr...

View detail
.net dotnet core spark parquet hive

.NET for Apache Spark Preview with Examples

776   2   about 6 months ago

I’ve been following Mobius project for a while and have been waiting for this day. .NET for Apache Spark v0.1.0 was just published on 2019-04-25 on GitHub. It provides high performance APIs for programming Apache Spark applications with C# and F#. It is .NET Standard complaint and can run in Wind...

View detail
spark hadoop pyspark oozie hue

Run Multiple Python Scripts PySpark Application with yarn-cluster Mode

264   0   about 2 months ago

When submitting Spark applications to YARN cluster, two deploy modes can be used: client and cluster. For client mode (default), Spark driver runs on the machine that the Spark application was submitted while for cluster mode, the driver runs on a random node in a cluster. On this page, I am goin...

View detail
python pyspark pandas

Convert PySpark Row List to Pandas Data Frame

189   0   about 2 months ago

In Spark, it’s easy to convert Spark Dataframe to Pandas dataframe through one line of code: df_pd = df.toPandas() In this page, I am going to show you how to convert a list of PySpark row objects to a Pandas data frame. Prepare the data frame The fo...

View detail
hadoop hive wsl

Apache Hive 3.1.1 Installation on Windows 10 using Windows Subsystem for Linux

1,199   2   about 6 months ago

Previously, I demonstrated how to configured Apache Hive 3.0.0 on Windows 10. Apache Hive 3.0.0 Installation on Windows 10 Step by Step Guide...

View detail
spark hadoop yarn oozie

Diagnostics: Container is running beyond physical memory limits

263   0   about 4 months ago

Scenario Recently I created an Oozie workflow which contains one Spark action. The Spark action master is yarn and deploy mode is cluster. Each time when the job runs about 30 minutes, the application fails with errors like the following: Application applicatio...

View detail