By using this site, you acknowledge that you have read and understand our Cookie policy, Privacy policy and Terms .

Raymond Tang

Big Data Engineer, Full Stack .NET and Cross-Platform Software Engineer/Architect


I'm passionate about building data driven, scalable, cloud native applications and products.

Microsoft MVP C#/.NET (2010-2016)/Visual Studio | MCP | MCSE: Data Management and Analytics | Google Cloud Platform Certified Professional Data Engineer

 LinkedIn    MVP Reconnect 

account_circle Raymond

Posts

Latest Hadoop 3.2.1 Installation on Windows 10 Step by Step Guide

local_offer hadoop local_offer yarn

visibility 4
comment 0
thumb_up 0
access_time 2 hours ago

This detailed step-by-step guide shows you how to install the latest Hadoop (v3.2.1) on Windows 10. It also provides a temporary fix for bug HDFS-14084 (java.lang.UnsupportedOperationException INFO).

open_in_new View

Machine Learning with .NET in Jupyter Notebooks

local_offer machine-learning local_offer jupyter-notebook local_offer C# local_offer dotnet core

visibility 103
comment 0
thumb_up 0
access_time 15 days ago

In this article, I'm going to show you how to install Jupyter in Windows and then install .NET kernel for Jupyter notebooks. It also shows a machine learning example using ML.NET. The target audience are .NET developers who want to expand their skills in data engineering and science domain...

open_in_new View

local_offer pyspark local_offer spark-2-x local_offer python

visibility 33
comment 0
thumb_up 0
access_time 18 days ago

This articles show you how to convert a Python dictionary list to a Spark DataFrame. The code snippets runs on Spark 2.x environments. Input The input data (dictionary list looks like the following): data = [{"Category": 'Category A', 'ItemID': 1, 'Amount': 12.40}, ...

open_in_new View

Improve PySpark Performance using Pandas UDF with Apache Arrow

local_offer pyspark local_offer spark local_offer spark-2-x local_offer pandas

visibility 120
comment 0
thumb_up 2
access_time 20 days ago

Apache Arrow is an in-memory columnar data format that can be used in Spark to efficiently transfer data between JVM and Python processes. This currently is most beneficial to Python users that work with Pandas/NumPy data. In this article, ...

open_in_new View

local_offer pyspark local_offer spark-2-x local_offer spark

visibility 9
comment 0
thumb_up 0
access_time 23 days ago

This article shows you how to read and write XML files in Spark. Sample XML file Create a sample XML file named test.xml with the following content: <?xml version="1.0"?> <data> <record id="1"> <rid>1</rid> <nam...

open_in_new View

local_offer python local_offer pandas

visibility 8
comment 0
thumb_up 0
access_time 23 days ago

Pickle files are commonly used Python data related projects. This article shows how to create and load pickle files using Pandas.  Create pickle file import pandas as pd import numpy as np file_name="data/test.pkl" data = np.random.randn(1000, 2) # pd.set_option('displ...

open_in_new View

local_offer pyspark local_offer spark-2-x local_offer spark local_offer python

visibility 13
comment 0
thumb_up 0
access_time 23 days ago

This article shows how to convert a Python dictionary list to a DataFrame in Spark using Python. Example dictionary list data = [{"Category": 'Category A', "ID": 1, "Value": 12.40}, {"Category": 'Category B', "ID": 2, "Value": 30.10}, {"Category": 'Category C', "...

open_in_new View

local_offer Azure local_offer ssl

visibility 18
comment 0
thumb_up 0
access_time 27 days ago

Google Chrome browser will mark websites as insecure if HTTPS is not enabled. Certificate issuer To enable SSL on your Azure websites, you can purchase SSL certificates from many certificate authorities. Let’s Encrypt is a free, automated, and open Certificate Authority. ...

open_in_new View

local_offer pyspark local_offer spark-2-x local_offer spark

visibility 16
comment 0
thumb_up 0
access_time 2 months ago

Sometime it is necessary to pass environment variables to Spark executors. To pass environment variable to executors, use setExecutorEnv function of SparkConf class. Code snippet In the following code snippet, an environment variable name ENV_NAME is set up with value ...

open_in_new View

local_offer pyspark local_offer spark local_offer spark-2-x

visibility 26
comment 0
thumb_up 0
access_time 2 months ago

Spark provides rich APIs to save data frames to many different formats of files such as CSV, Parquet, Orc, Avro, etc. CSV is commonly used in data application though nowadays binary formats are getting momentum. In this article, I am going to show you how to save Spark data frame as CSV file in b...

open_in_new View

Comments

Have you got your problem resolved?


format_quote

person U-7e9qo64lwkem90f8 access_time 5 months ago
Re: Install Hadoop 3.0.0 in Windows (Single Node)

@Raymond Tang I run in console. i did not do double click on the cmd file.
reply Reply

Hello,

I didn’t quite get this. Can you be more specific? Do you mean the sample code is not working or the UI is not responsive?


format_quote

person Boris access_time 3 months ago
Re: SQLite in .NET Core with Entity Framework Core

off topic. It doesn't matter how gool looking this site if it not working properly: just try change screen resolution/

reply Reply

I'm glad it's helping you. :)


format_quote

person Shaik Moulali access_time 3 months ago
Re: Configure Hadoop 3.1.0 in a Multi Node Cluster

Thank you so much you are a great job.

Your tutorial helped me a lot in hadoop multinode cluster administration.

Keep posting latest updates...


reply Reply

Hello,

This should has been fixed now according to the issues trackers on GitHub. If you are still encountering this problem, please report to the project site on GitHub.

BTW, I've migrated to use Gulp to package my client resources. Refer to the following page for more details:

Migrate from Bower to Gulp for Client Libraries Management in ASP.NET Core


format_quote

person G Swanson access_time 3 months ago
Re: ASP.NET Core 2 with Bootstrap 4 Bundler Minifier Issue: Expected semicolon or closing curly-brace found '-'

I am still getting the problem. What do I need to upgrade to fix it?

I am using Visual Studio 2019 and most of my nuget packages are pretty recent.

reply Reply

Hello, yes you are right. You may also need to install a metastore database depends on which database you want to use as detailed in the above installation guide.

BTW, if you are using Windows 10, I would recommend using WSL to install. 

Refer to this page for more details:

https://kontext.tech/docs/DataAndBusinessIntelligence/p/big-data-tools-on-windows-via-windows-subsystem-for-linux-wsl

You can find my LinkedIn link on the About page of this site.

Cheers,

Raymond


format_quote

person Swati Agarwal access_time 3 months ago
Re: Apache Hive 3.0.0 Installation on Windows 10 Step by Step Guide

Hi Team, 

Yes it was installation issue. Thanks for the help.

I am new to Hadoop 3, and would seek your guidance.

For installing and working in Hadoop 3, we have to follow:

1) Hadoop 3 installation process 

https://kontext.tech/docs/DataAndBusinessIntelligence/p/install-hadoop-300-in-windows-single-node

2) Hive process

https://kontext.tech/docs/DataAndBusinessIntelligence/p/apache-hive-300-installation-on-windows-10-step-by-step-guide

Please correct me if I am wrong.

Is there anything else that is required to  be installed or set up? Please suggest and guide me.

Also can we connect over linkedin? If I get stuck somewhere I would need your expert advice.

My linkedin id is : https://www.linkedin.com/in/swati0303/

It will be really really helpful.

Regards,

Swati


reply Reply
@Sekhar You can refer to the user documentation link about sqoop import command. You need to pass in your JDBC url, user name, password, source table, etc.
format_quote

person Sekhar access_time 3 months ago
Re: Sqoop Installation on Windows 10 using Windows Subsystem for Linux

Thanks for the detailed steps for installing Hadoop, Sqoop. I was trying to sqoop data from mysql and getting different kind of errors. I followed your instructions to install on WSL. Would it be possible to provide steps/examples to sqoop data from any derby/mysql. Thank you.. 
reply Reply

Did you get Hadoop installation successfully first?

This is a HDFS CLI issue and is not related to Hive installation.


format_quote

person Swati Agarwal access_time 4 months ago
Re: Apache Hive 3.0.0 Installation on Windows 10 Step by Step Guide

Hi Team,

At step, Set up Hive HDFS folders while creating dir using hadoop fs -mkdir /tmp at cmd, the system is throwing an error.

mkdir: Your endpoint configuration is wrong

Please suggest how to resolve this.

Regards,

Swati

reply Reply

There are many ways to do it:

  • hadoop fs commands to copy file from local to HDFS
  • Spark or any other frames that can talk with HDFS...
  • Sqoop (SQL to Hadoop)

format_quote

person Swati Agarwal access_time 4 months ago
Re: Install Hadoop 3.0.0 in Windows (Single Node)

Hi there,

I have installed the Hadoop 3 as per instructions mentioned above. Please suggest the steps to load data in Hadoop through cmd in windows 10 and also to  perform operation on it.

Regards,

Swati

reply Reply

Hi,

You got that error because Zeppelin cannot find the SQL Server JDBC driver.

Have you setup the dependencies for the interpreter as shown in the screenshot above? 

artifact: com.microsoft.sqlserver:mssql-jdbc:6.5.1.jre8-preview

Make sure Zeppelin install this artifact from internet successfully. 


format_quote

person Son Nguyen access_time 4 months ago
Re: Connecting Apache Zeppelin to your SQL Server

After running the notebook, I encountered this error:
java.lang.ClassNotFoundException: com.microsoft.sqlserver.jdbc.SQLServerDriver at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Unknown Source) at org.apache.zeppelin.jdbc.JDBCInterpreter.createConnectionPool(JDBCInterpreter.java:412) at org.apache.zeppelin.jdbc.JDBCInterpreter.getConnectionFromPool(JDBCInterpreter.java:423) at org.apache.zeppelin.jdbc.JDBCInterpreter.getConnection(JDBCInterpreter.java:486) at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:692) at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:820) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:632) at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162) at jaat java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)
  
Please help me fix it!
reply Reply
When you run the cmd script, did you directly open the script file or run the command line in Command Prompt?
format_quote

person David Serrano access_time 5 months ago
Re: Install Hadoop 3.0.0 in Windows (Single Node)

Hi,
I see your tutorial about the installation of hadoop on windows
However i am gettin this error when try to run the yarn demons with start-yarn.cmd:

This file does not have an app associated with it for performing this action. Please install an app or, if one is already installed, create an association in the defaul apps settings page.

Do you know some solution for that?

Thanks in advance.
reply Reply

Columns

ML.NET is an open source and cross-platform machine learning framework. With ML.NET, you can create custom ML models using C# or F# without having to leave the .NET ecosystem. This column publish articles about ML.NET.

open_in_new View

Code snippets for various programming languages/frameworks.

open_in_new View

Data analytics with Google Cloud Platform.

open_in_new View

Data analytics with Microsoft Azure cloud platform.

open_in_new View

Posts about Apache Sqoop, a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.

open_in_new View

PowerShell, Bash, ksh, sh, Perl and etc. 

open_in_new View

General IT information for programming.

open_in_new View

Apache Spark installation guides, performance tuning tips, general tutorials, etc.

open_in_new View