Tag - spark

.net dotnet core spark parquet hive

.NET for Apache Spark Preview with Examples

792   2   about 6 months ago

I’ve been following Mobius project for a while and have been waiting for this day. .NET for Apache Spark v0.1.0 was just published on 2019-04-25 on GitHub. It provides high performance APIs for programming Apache Spark applications with C# and F#. It is .NET Standard complaint and can run in Wind...

View detail
spark hadoop pyspark oozie hue

Run Multiple Python Scripts PySpark Application with yarn-cluster Mode

289   0   about 2 months ago

When submitting Spark applications to YARN cluster, two deploy modes can be used: client and cluster. For client mode (default), Spark driver runs on the machine that the Spark application was submitted while for cluster mode, the driver runs on a random node in a cluster. On this page, I am goin...

View detail
spark hadoop yarn oozie

Diagnostics: Container is running beyond physical memory limits

273   0   about 4 months ago

Scenario Recently I created an Oozie workflow which contains one Spark action. The Spark action master is yarn and deploy mode is cluster. Each time when the job runs about 30 minutes, the application fails with errors like the following: Application applicatio...

View detail
lite-log spark pyspark

Fix PySpark TypeError: field **: **Type can not accept object ** in type <class '*'>

594   0   about 4 months ago

When creating Spark date frame using schemas, you may encounter errors about “field **: **Type can not accept object ** in type &lt;class '*'&gt;”. The actual error can vary, for instances, the following are some examples: field xxx: BooleanType can not accept object 100 in type ...

View detail
python spark pyspark

PySpark: Convert Python Array/List to Spark Data Frame

1,888   0   about 4 months ago

In Spark, SparkContext.parallelize function can be used to convert Python list to RDD and then RDD can be converted to DataFrame object. The following sample code is based on Spark 2.x. In this page, I am going to show you how to convert the following list to a data frame: data = [(...

View detail
teradata spark pyspark

Load Data from Teradata in Spark (PySpark)

849   0   about 4 months ago

In my article Connect to Teradata database through Python , I demonstrated about how to use Teradata python package or Teradata ODBC driver to connect to Teradata. In this article, I’m going to...

View detail
python spark hadoop pyspark

Read Hadoop Credential in PySpark

267   0   about 4 months ago

In one of my previous articles about Password Security Solution for Sqoop , I mentioned creating credential using hadoop credential command. The credentials are stored in JavaKey...

View detail
spark linux wsl

Apache Spark 2.4.3 Installation on Windows 10 using Windows Subsystem for Linux

1,548   4   about 6 months ago

This pages summarizes the steps to install the latest version 2.4.3 of Apache Spark on Windows 10 via Windows Subsystem for Linux (WSL). Prerequisites Follow either of the following pages to install WSL in a system or non-system drive on your Windows 10. ...

View detail
zeppelin spark hadoop linux sqoop hive wsl

Big Data Tools on Windows via Windows Subsystem for Linux (WSL)

551   0   about 6 months ago

This page summarizes the installation guides about big data tools on Windows through Windows Subsystem for Linux (WSL). ...

View detail
zeppelin spark linux wsl

Install Zeppelin 0.7.3 on Windows 10 using Windows Subsystem for Linux (WSL)

705   0   about 6 months ago

This page summarizes the steps to install Zeppelin version 0.7.3 on Windows 10 via Windows Subsystem for Linux (WSL). Version 0.8.1 When running Zeppelin in Ubuntu, the server may pick up one host address that is not accessible, for example 169.254.148.100, and the the remote interprete...

View detail