Install Zeppelin 0.7.3 on Windows

2018-03-03 big-data-on-windows-10sparkzeppelin

This post summarizes the steps to install Zeppelin 0.7.3 in Windows environment.

Tools and Environment

  • GIT Bash
  • Command Prompt
  • Windows 10

Download Binary Package

Download the latest binary package from the following website:

http://zeppelin.apache.org/download.html

In my case, I am saving the file to folder: F:\DataAnalytics

UnZip Binary Package

Open Git Bash, and change directory (cd) to the folder where you save the binary package and then unzip:

$ cd F:\DataAnalytics

fahao@Raymond-Alienware MINGW64 /f/DataAnalytics $ tar -xvzf  zeppelin-0.7.3-bin-all.gz

After running the above commands, the package is unzip to folder: F:\DataAnalytics\zeppelin-0.7.3-bin-all

Run Zeppelin

Before starting Zeppelin, make sure JAVA_HOME environment variable is set.

JAVA\_HOME environment variable

JAVA_HOME environment variable value should be your Java JRE path.

/project/zeppelin/resources/E055281C-82B0-5006-A25C-60318C78535F.webp

Start Zeppelin

Run the following command in Command Prompt (Remember to the path to your own Zeppelin folder):

cd /D F:\DataAnalytics\zeppelin-0.7.3-bin-all\bin

F:\DataAnalytics\zeppelin-0.7.3-bin-all\bin>zeppelin.cmd

Wait until Zeppelin server is started:

/project/zeppelin/resources/8E7ECCB8-89B5-5F02-A2C4-5385ED76DFB9.webp

Verify

In any of your browser, navigate to http://localhost:8080/

The UI should looks like the following screenshot:

/project/zeppelin/resources/028D8BEB-3314-5BE3-9D9E-66D877FD289E.webp

Create Notebook

Create a simple note using markdown and then run it:

/project/zeppelin/resources/1F8C5F31-EDCA-5199-AF30-DBFB73AAC5A5.webp

java.lang.NullPointerException

If you got this error when using Spark as interpreter, please refer to the following pages for details:

https://issues.apache.org/jira/browse/ZEPPELIN-2438

https://issues.apache.org/jira/browse/ZEPPELIN-2475

Basically, even you configure Spark interpreter not to use Hive, Zeppelin is still trying to locate winutil.exe through environment variableHADOOP_HOME.

Thus to resolve the problem, you need to install Hadoop in your local system and then add one environment variable:

/project/zeppelin/resources/4381DDE5-722A-5CA4-B602-C922E5B8AA75.webp

After the environment variable is added, please restart the whole Zeppelin server and then you should be able to run Spark successfully.

/project/zeppelin/resources/7447B6DC-21A3-58D4-AF73-515AC2B59162.webp

You should also be able to run the tutorials provided as part of the installation:

/project/zeppelin/resources/089153FB-2644-553E-85BA-0AD3E9D3AE28.webp

org.apache.zeppelin.interpreter.InterpreterException:

If you encounter the following error:

org.apache.zeppelin.interpreter.InterpreterException: The filename, directory name, or volume label syntax is incorrect.

at org.apache.zeppelin.interpreter.remote.RemoteInterpreterManagedProcess.start(RemoteInterpreterManagedProcess.java:143) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.reference(RemoteInterpreterProcess.java:73) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:265) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:430) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:111) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:387) at org.apache.zeppelin.scheduler.Job.run(Job.java:175) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

It is probably caused by the same issue in this JIRA task if you have installed Spark locally:

https://issues.apache.org/jira/browse/ZEPPELIN-2677

To fix it, you can remove ‘SPARK_HOME’ environment variable and your Spark should still be able to run correctly if you run spark shell using full path of spark-shell.cmd.