By using this site, you acknowledge that you have read and understand our Cookie policy, Privacy policy and Terms .
close

Code snippets for various programming languages/frameworks.

rss_feed Subscribe RSS

local_offer pyspark local_offer spark-2-x local_offer spark

visibility 28
thumb_up 0
access_time 3 months ago

This article shows you how to read and write XML files in Spark. Sample XML file Create a sample XML file named test.xml with the following content: <?xml version="1.0"?> <data> <record id="1"> <rid>1</rid> <nam...

open_in_new View

local_offer python local_offer pandas

visibility 15
thumb_up 0
access_time 3 months ago

Pickle files are commonly used Python data related projects. This article shows how to create and load pickle files using Pandas.  Create pickle file import pandas as pd import numpy as np file_name="data/test.pkl" data = np.random.randn(1000, 2) # pd.set_option('displ...

open_in_new View

local_offer pyspark local_offer spark-2-x local_offer spark

visibility 23
thumb_up 0
access_time 3 months ago

Sometime it is necessary to pass environment variables to Spark executors. To pass environment variable to executors, use setExecutorEnv function of SparkConf class. Code snippet In the following code snippet, an environment variable name ENV_NAME is set up with value ...

open_in_new View

local_offer python

visibility 22
thumb_up 1
access_time 4 months ago

Different programming languages have different package management tools.

open_in_new View

local_offer teradata local_offer SQL

visibility 46
thumb_up 0
access_time 4 months ago

This code snippet shows how to calculate time differences.

open_in_new View

local_offer hadoop local_offer shell

visibility 15
thumb_up 0
access_time 4 months ago

Hadoop provides a number of CLIs. hadoop job command can be used to retrieve running job list.

You can also use YARN resource manager UI to view the jobs too.

open_in_new View

local_offer hadoop local_offer shell

visibility 15
thumb_up 0
access_time 4 months ago

Hadoop provides a number of CLIs that can be used to perform many tasks/activities. This code snippet shows you how to check file/folder size in HDFS.

open_in_new View

local_offer scala local_offer spark-2-x

visibility 205
thumb_up 0
access_time 4 months ago

In Spark, SparkContext.parallelize function can be used to convert list of objects to RDD and then RDD can be converted to DataFrame object through SparkSession.

open_in_new View

local_offer python local_offer spark-2-x

visibility 108
thumb_up 0
access_time 4 months ago

In Spark, SparkContext.parallelize function can be used to convert list of objects to RDD and then RDD can be converted to DataFrame object through SparkSession.

open_in_new View

local_offer scala local_offer spark-2-x

visibility 150
thumb_up 0
access_time 4 months ago

Spark has easy fluent APIs that can be used to read data from JSON file as DataFrame object. 

open_in_new View