By using this site, you acknowledge that you have read and understand our Cookie and Privacy policy. Your use of Kontext website is subject to this policy. Accept

Install Big Data Tools (Spark, Zeppelin, Hadoop) in Windows for Learning and Practice

1188 views 2 comments last modified about 11 months ago Raymond Tang

lite-log

Are you a Windows/.NET developer and willing to learn big data concepts and tools in your Windows?

If yes, you can follow the links below to install them in your PC. The installations are usually easier to do in Linux/UNIX but they are not difficult to implement in Windows either since they are based on Java.

Installation guides

All the following documents are based on Windows 10. The steps should be the same in other Windows environments though some of the screenshots may be different.

Install Zeppelin 0.7.3 in Windows

Install Hadoop 3.0.0 in Windows (Single Node)

Install Spark 2.2.1 in Windows

Install Apache Sqoop in Windows

Configure Hadoop 3.1.0 in a Multi Node Cluster

Learning tutorials - latest update (2018-05-06)

Use Hadoop File System Task in SSIS to Write File into HDFS
Invoke Hadoop WebHDFS APIs in .NET Core

Write and Read Parquet Files in Spark/Scala

Write and Read Parquet Files in HDFS through Spark/Scala

Convert String to Date in Spark (Scala)

Read Text File from Hadoop in Zeppelin through Spark Context

Connecting Apache Zeppelin to your SQL Server

Load Data into HDFS from SQL Server via Sqoop

Default Ports Used by Hadoop Services (HDFS, MapReduce, YARN)

I will be constantly updating my blog with tutorials. Feel free to subscribe this blog (RSS).

Related pages

Debug PySpark Code in Visual Studio Code

21 views   0 comments last modified about 16 days ago

The page summarizes the steps required to run and debug PySpark (Spark for Python) in Visual Studio Code. Install Python and pip Install Python from the official website: https://...

View detail

Implement SCD Type 2 Full Merge via Spark Data Frames

307 views   0 comments last modified about 2 months ago

Overview For SQL developers that are familiar with SCD and merge statements, you may wonder how to implement the same in big data platforms, considering database or storages in Hadoop are not designed/optimised for record level updates and inserts. In this post, I’m going to demons...

View detail

Password Security Solution for Sqoop

37 views   0 comments last modified about 3 months ago

In Sqoop, there are multiple approaches to pass in passwords for RDBMS. Options Option 1 - clear password through --password argument sqoop [subcommand] --username user --password pwd This is the weakest approach as password is exposed directly...

View detail

PySpark: Convert JSON String Column to Array of Object (StructType) in Data Frame

421 views   0 comments last modified about 3 months ago

This post shows how to derive new column in a Spark data frame from a JSON array string column. I am running the code in Spark 2.2.1 though it is compatible with Spark 1.6.0 (with less JSON SQL functions). Prerequisites Refer to the following post to install Spark in Windows. ...

View detail

Install Zeppelin 0.7.3 in Windows

2456 views   6 comments last modified about 2 years ago

This post summarizes the steps to install Zeppelin 0.7.3 in Windows environment. Tools and Environment GIT Bash Command Prompt Windows 10 Download Binary Package Download the latest binary package from the following website: ...

View detail

Install Hadoop 3.0.0 in Windows (Single Node)

12863 views   14 comments last modified about 2 years ago

This page summarizes the steps to install Hadoop 3.0.0 in your Windows environment. Reference page: https://wiki.apache.org/hadoop/Hadoop2OnWindows ...

View detail

Add comment

Comments (0)

No comments yet.

Contacts

  • enquiry[at]kontext.tech

Subscribe