zeppelin spark hadoop linux sqoop hive wsl

Big Data Tools on Windows via Windows Subsystem for Linux (WSL)

44   0   about 6 days ago

This page summarizes the installation guides about big data tools on Windows through Windows Subsystem for Linux (WSL). ...

View detail
linux sqoop wsl

Sqoop Installation on Windows 10 using Windows Subsystem for Linux

5   0   about 6 days ago

This page summarizes the steps required to install Apache Sqoop (v1.4.7) in Windows 10 environment via Windows Subsystem for Linux (WSL). Prerequisites If you have already installed Hadoop 3.2.0 in WSL, ignore the following steps as you don’t need to install it again. Follow&...

View detail
spark linux wsl

Apache Spark 2.4.3 Installation on Windows 10 using Windows Subsystem for Linux

27   4   about 6 days ago

This pages summarizes the steps to install the latest version 2.4.3 of Apache Spark on Windows 10 via Windows Subsystem for Linux (WSL). Prerequisites Follow either of the following pages to install WSL in a system or non-system drive on your Windows 10. ...

View detail
zeppelin spark linux wsl

Install Zeppelin 0.7.3 on Windows 10 using Windows Subsystem for Linux (WSL)

20   0   about 6 days ago

This page summarizes the steps to install Zeppelin version 0.7.3 on Windows 10 via Windows Subsystem for Linux (WSL). Version 0.8.1 When running Zeppelin in Ubuntu, the server may pick up one host address that is not accessible, for example 169.254.148.100, and the the remote interprete...

View detail
hadoop hive wsl

Apache Hive 3.1.1 Installation on Windows 10 using Windows Subsystem for Linux

35   0   about 6 days ago

Previously, I demonstrated how to configured Apache Hive 3.0.0 on Windows 10. Apache Hive 3.0.0 Installation on Windows 10 Step by Step Guide...

View detail
lite-log hive

HiveServer2 Cannot Connect to Hive Metastore Resolutions/Workarounds

76   0   about 6 days ago

Since Hive 3.x, new authentication feature for HiveServer2 client is added. When starting HiveServer2 service (Hive version 3.0.0), you may encounter errors like: ‘HiveServer2 metastore.RetryingMetaStoreClient: RetryingMetaStoreClient trying reconnect as [username]  (auth:S...

View detail
sql server hive

Configure a SQL Server Database as Remote Hive Metastore

118   0   about 7 days ago

In one of my previous post, I showed how to configure Apache Hive 3.0.0 in Windows 10. Apache Hive 3.0.0 Installation on Windows 10 Step by Step Guide ...

View detail
hadoop linux wsl

Install Hadoop 3.2.0 on Windows 10 using Windows Subsystem for Linux (WSL)

202   3   about 7 days ago

In my previous post , I showed how to configure a single node Hadoop instance on Windows 10. The steps are not too difficult to follow if you have Java programming backgr...

View detail
lite-log linux wsl ubuntu

Install Windows Subsystem for Linux on a Non-System Drive

28   0   about 9 days ago

This page shows how to install Windows Subsystem for Linux (WSL) system on a non-system drive manually. Enable Windows Subsystem for Linux system feature Open PowerShell as Administrator and run the following command to enable WSL feature: Enable-WindowsOptionalFea...

View detail
kontext lite-log

Notification Email Address Change Notice

13   0   about 19 days ago

In the past months, this website has been using the following Email address to delivery all the notification messages to the website users such as registration confirmation email, comment email and so on. no-reply[at]kontext.tech However, I recently found that...

View detail
sql server java kerberos ntlm

JDBC Integrated Security, NTLM and Kerberos Authentication for SQL Server

40   0   about 20 days ago

With Microsoft SQL Server JDBC driver, you can connect to the database through SQL Server Authentication or Kerberos Authentication. This post summarizes the configurations required for each authentication method with coding examples. *NTLM block in the following diagram represents pure Jav...

View detail
.net dotnet core spark parquet hive

.NET for Apache Spark Preview with Examples

108   0   about 28 days ago

I’ve been following Mobius project for a while and have been waiting for this day. .NET for Apache Spark v0.1.0 was just published on 2019-04-25 on GitHub. It provides high performance APIs for programming Apache Spark applications with C# and F#. It is .NET Standard complaint and can run in Wind...

View detail
java lite-log hive

Connect to Hive via HiveServer2 JDBC Driver

24   0   about 2 months ago

This post shows you how to connect to HiveServer2 via Hive JDBC driver in Java. *The way to connect to HiveServer1 is very similar though the driver names are different: Version Drive...

View detail
hadoop hive

Apache Hive 3.0.0 Installation on Windows 10 Step by Step Guide

1,231   7   about 2 months ago

If you have been following my website, you would know I’ve published a number of articles about installing big data tools/framewo...

View detail
lite-log

Install Big Data Tools (Spark, Zeppelin, Hadoop) in Windows for Learning and Practice

1,685   4   about 2 months ago

Are you a Windows/.NET developer and willing to learn big data concepts and tools in your Windows? If yes, you can follow the links below to install them in your PC. The installations are usually easier to do in Linux/UNIX but they are not difficult to implement in Windows either since the...

View detail
asp.net core gulp

Migrate from Bower to Gulp for Client Libraries Management in ASP.NET Core

32   0   about 2 months ago

Background If you have been working on ASP.NET projects in the past years, you probably have heard or used quite a few client library management frameworks/tools. For example, Bower, npm, Gulp, Grunt, Webpack, Yarn, Parcel, Libman, etc. Before SPA became popular, the default ASP.NET (or A...

View detail
spark pyspark partitioning

Data Partitioning Functions in Spark (PySpark) Deep Dive

49   0   about 2 months ago

In my previous post about Data Partitioning in Spark (PySpark) In-depth Walkthrough , I mentioned how to repartition data frames in Spark using repartition ...

View detail
lite-log spark pyspark

Get the Current Spark Context Settings/Configurations

29   0   about 2 months ago

In Spark, there are a number of settings/configurations you can specify including application properties and runtime parameters. https://spark.apache.org/docs/latest/configuration.html Ge...

View detail

Querying Teradata and SQL Server - Tutorial 1: The SELECT Statement

34,509   7   about 5 years ago

SELECT is one of the most commonly used statements. In this tutorial, I will cover the following items: Two of the principal query clauses—FROM and SELECT Data Types Built-in functions CASE expressions and variations like ISNULL and COALESCE. * The functio...

View detail
hadoop yarn hdfs

Install Hadoop 3.0.0 in Windows (Single Node)

15,581   20   about 2 years ago

This page summarizes the steps to install Hadoop 3.0.0 in your Windows environment. Reference page: https://wiki.apache.org/hadoop/Hadoop2OnWindows ...

View detail

Install Teradata Express 15.0.0.8 by Using VMware Player 6.0 in Windows

14,175   23   about 5 years ago

In this article, I am going to introduce how to install Teradata Express in virtual machines in Windows. Download software 1) Download VMware Player for Windows 32-bit and 64-bit from the following link (version 6.0): ...

View detail

Working with SQL Server Compact 4.0 using Entity Framework 6 and ADO.NET

12,125   0   about 5 years ago

SQL Server Compact 4.0 (CE 4.0) is a free SQL Server embedded database ideal for building standalone and occasionally connected applications for mobile devices, desktops, Web clients and others. In one of my projects, I used it as the database for logging errors, which assumes the errors will onl...

View detail
asp.net core 2

Server.MapPath Equivalent in ASP.NET Core 2

11,063   0   about 2 years ago

In traditional asp.net applications, Server.MapPath is commonly used to generate absolute path in the web server. However, this has been removed from ASP.NET Core. So what is the equivalent way of doing it?

View detail

Create ETL Project with Teradata through SSIS

10,763   2   about 4 years ago

Infosphere DataStage is adopted as ETL (Extract, Transform, Load) tool in many Teradata based data warehousing projects. With the Teradata ODBC and .NET data providers, you can also use the BI tools from Microsoft, i.e. SSIS. In my previous post, I demonstrated how to install Teradata Tool...

View detail

Generate Formatted Excel Destination (Output) in SSIS Data Flow Task

10,393   0   about 5 years ago

SSIS (SQL Server Integration Service) provides a number of convenient tasks to enable data integration. Exporting data from database to Excel file is a common task in ETL (Extract, Transform, Load) projects. Constantly the users/customers may raise format request regarding the Excel extract. To g...

View detail
spark scala parquet

Write and Read Parquet Files in Spark/Scala

8,913   2   about 2 years ago

In this page, I’m going to demonstrate how to write and read parquet files in Spark/Scala by using Spark SQLContext class. Reference What is parquet format? Go the following project site to understand more about parquet. ...

View detail
asp.net core identity core 2

Retrieve Identity username, email and other information in ASP.NET Core

8,266   0   about 2 years ago

The identity system in ASP.NET has evolved over time. If you are using ASP.NET Core, you probably found User property is an instance of ClaimsPrincipal in Controller or Razor views. Thus to retrieve the information, you need to utilize the claims.

View detail
dotnet core angular asp.net core 2

Issue - Unable to get property 'apply' of undefined or null reference occurred in Angular 4.*, VS2017 15.3, ASP.NET Core 2.0

7,877   10   about 3 years ago

Issue Context After installed Visual Studio 2017 15.3 preview and .net core 2.0 preview SDK, I upgraded one of my existing asp.net core project to 2.0. The project was created using ‘dotnet new angular’ SPA template.  I also upgraded all the client app packages to the latest. For exa...

View detail
java kerberos

Java Kerberos Authentication Configuration Sample & SQL Server Connection Practice

7,649   2   about 4 years ago

Overview Recently, I have been working on an ETL framework to load various source data (i.e. files, SQL Server, Oracle and Teradata) into Teradata. Due to some limitations, Java was chosen as the implementation language though IBM Infosphere DataStage is available to use. DataStage has p...

View detail

Connect to Teradata Virtual Machine Guest from Windows Host

7,051   16   about 5 years ago

In my previous posts about Querying Teradata and SQL Server, I logged into the virtual machine graphic interface to manage the database. However, I constantly found it is resource intensive as there is only 4GB memory in my laptop. Instead, I will use text mode to start the virtual machine and co...

View detail
teradata python

Connect to Teradata database through Python

5,643   3   about 2 years ago

Teradata published an official Python module which can be used in DevOps projects. More details can be found at the following GitHub site: https://github.com/Teradata/PyTd Install Teradata module ...

View detail

[C#] Connect to Teradata Database via .NET Data Provider

5,419   2   about 4 years ago

In this post, I will demonstrate how to connect to Teradata database via .NET Data Provider for Teradata using C#. Prerequisites Install the .NET Data Provider for Teradata from the following link: ...

View detail
lite-log spark hdfs scala parquet

Write and Read Parquet Files in HDFS through Spark/Scala

4,984   0   about 2 years ago

In my previous post, I demonstrated how to write and read parquet files in Spark/Scala. The parquet file destination is a local folder. Write and Read Parquet Files in Spark/Scala In this page...

View detail
lite-log scala

Convert String to Date in Spark (Scala)

4,852   0   about 2 years ago

Context This pages demonstrates how to convert string to java.util.Date in Spark via Scala. Prerequisites If you have not installed Spark, follow the page below to install it: ...

View detail

Create and Debug C/C++ Programs with Eclipse and Cygwin in Windows

4,816   0   about 4 years ago

In this post, I am going to demonstrate how to use Eclipse to create and debug C/C++ programs for Unix/Linux in Windows. I am going to use Cygwin GCC as toolchains. Cygwin GDB will also be installed for debugging purpose. I am using Windows 10 and JRE 1.8 in the following steps. Install E...

View detail

Resolve the Issues in Upgrading Entity Framework to Version 6.1

4,639   0   about 5 years ago

When upgrading your Entity Framework to Entity Framework 6.1 (EF6) from version 5.0, you may meet a number of issues. I have summarized all the issues I’ve encountered and their resolutions for your reference. Upgrade to EF6 Microsoft has provided one summary about upgrading to E...

View detail

about 6 hours ago

@Jonathan

I’m glad to hear that it’s now working for you.

about 12 hours ago

Hi,

Thanks for pointing out the error. I did as you suggested and it works now! Great post on setting up Apache and Hadoop in WSL!

Jonathan

about 23 hours ago

I get Permission Denied when trying to get hadoop binary. after research I found that I need to use sudo in front of command. So need to use 

sudo wget http://mirrors.....


Thanks for great article!


about 1 day ago

Hi,

Most likely it is because ssh is not working. I found that each time when you restart your windows system, you need to re-run the command to restart ssh services:

sudo service ssh restart

Make sure you can ssh localhost successfully without a passphrase.

To make it easy, I have added command ‘sudo service ssh restart’ into my .bashrc file so that each time when I start WSL it will restart ssh to make sure it work.

Please stop all Hadoop services first and then restart the services:

sbin/stop-all.sh

sbin/start-all.sh

In Hadoop home folder there is a ‘logs’ folder created which includes the name node log file too. You can find the detailed the error message there.

Please provide your detailed error logs here if the above suggestions don’t work.

about 1 day ago

In my case, the command: 

sbin/start-dfs.sh 

is executed without errors, but the NameNode is not started and therefore it is not responding on http://localhost:9870.

Executing jps command I can see how running processes are:

1) SecondaryNameNode

2) DataNode

3) Jps

NameNode process is missing from the returned list.


Any idea on what can be going wrong?

I followed all the instructions in this guide to configure my WSL environment.


Thanks 

about 4 days ago

Hi,

run-example is a command not an Scala function. The script file exists in $SPARK_HOME/bin folder. Thus please directly run it in bash (WSL terminal) instead of running it in Spark shell.

Let me know if you have other questions.

about 4 days ago

I folllowed your instructions and installed scala after installing hadoop. But when I try to run the SparkPi example, i get the following;

scala> run-example SparkPi 10

<console>:24: error: not found: value run

       run-example SparkPi 10

       ^

<console>:24: error: not found: value example

       run-example SparkPi 10

Not sure what the error is. Thanks.

Sincerely

Jonathan

about 7 days ago

Hi, this post is about installing 3.0.0 and I have not tried 3.2.0 using this approach yet. Some configuration properties have changed. 

Your error message looks like that the Username was not configured correctly due to some reason. 

However, since you are using Windows 10, I'd suggest you to follow the post below to install 3.2.0 in Windows Subsystem for Linux (WSL). 

Install Hadoop 3.2.0 on Windows 10 using Windows Subsystem for Linux (WSL)

about 7 days ago

Hi,
I tried your guidelines for installing Hadoop 3.2.0 on my Windows 10 but unfortunately I am unable to start it. I am stuck at the point where you stated the command hadoop namenode -format . I am getting this message at the end of the command Shutting down NameNode at {Username}/192.168.0.200.
But no descriptive error other than that. Can you help me please?

about 7 days ago

You mentioned $HIVE_PATH in one of the previous comments while it should be $HIVE_HOME. Can you please double check that?

Based on what you have described and also if I understand correctly:

Your issue is that you cannot run the following command successfully:

$HIVE_HOME/bin/schematool -dbType derby -initSchema

And you got error: no such file or directory exist.

Usually this issue will happen if:

  • No x permission (execute) for your account on schematool file, which is why I recommended to check that permission and add it if missing.
  • Or as the error message self-described, the path doesn't exist. For example, it may be because your $HIVE_HOME environment variable is not setup correctly. I would recommend to follow the steps below to add it into your .bashrc file and then re-run the schema initialisation command: 

1) Edit file ~/.bashrc by running the following command

vi ~/.bashrc

2) Add the following line at the end of the file:

export HIVE_HOME={your hive home folder path}

3) Source the settings 

source ~/.bashrc

And then run the command:

$HIVE_HOME/bin/schematool -dbType derby -initSchema

If the above suggestions don't work still, I'm not sure whether I can help more unless you provide screenshots about your Cygwin window, hive folder,  and detailed error messages here.

You can upload images in the comment section directly.  Or alternatively, write me an email at enquiry[at]kontext.tech

about 8 days ago

So, $HIVE_HOME/bin was in the path. So, I just ran schematool -dbType mysql -initSchema. Also, before that I did hive -service metastore.

about 10 days ago

Can you run the following command in Cygwin to see if the script file is executable?

ls -alt $HIVE_HOME/bin

-rwxr-xr-x+ 1 fahao fahao   832 May 16  2018 metatool

-rwxr-xr-x+ 1 fahao fahao   884 May 16  2018 schematool

The output should look like the above one. 'x' means execution permission. You will need that permission before you can execute the scripts. 

If no permission, please try the following command to add it:

chmod +x $HIVE_HOME/bin/schematool

about 11 days ago

Yes, I did. When I go the path from cygwin, and do a ls, I see schematool there. Also when I print $HIVE_PATH and $HADOOP_PATH I get the correct location.

about 11 days ago

I've submitted the example code to GitHub for your reference:

sqlite-example

about 11 days ago

Hello, from where I can download the example because I am trying to make a small application but I get an error when accessing or creating the database. Thank you

about 12 days ago

Very well written guide. Thanks for posting this. Hadoop on Windows can be a daunting task. I have had several issues while I install Hadoop 2.9 on my machine. It encouraged me to document the working steps and ended up writing this Blog Post.

https://exitcondition.com/install-hadoop-windows/

Thanks anyway for writing this. It helps a lot of learners out there.

about 14 days ago

Did you run the command in Cygwin terminal? $HIVE_HOME is the syntax for Linux/UNIX and only works in Cygwin (or other equivalent terminal) in Windows.

about 14 days ago

When I try to run $HIVE_HOME/bin/schematool -dbType derby -initSchema I get no such file or directory exist. But when I go to the exact location with cd and run ls, I see the file there. Also, when I echo HIVE_HOME it return me the exact path.