Kontext Column

Created for everyone to publish data, programming and cloud related articles. Follow three steps to create your columns.


Learn more arrow_forward

Featured posts

local_offer python local_offer spark local_offer pyspark

visibility 24501
thumb_up 7
access_time 2 years ago

Data partitioning is critical to data processing performance especially for large volume of data processing in Spark. Partitions in Spark won’t span across nodes though one node can contains more than one partitions. When processing, Spark assigns one task for each partition and each worker threa...

open_in_new Spark + PySpark

local_offer pyspark local_offer spark local_offer spark-2-x

visibility 6855
thumb_up 0
access_time 9 months ago

Spark provides rich APIs to save data frames to many different formats of files such as CSV, Parquet, Orc, Avro, etc. CSV is commonly used in data application though nowadays binary formats are getting momentum. In this article, I am going to show you how to save Spark data frame as CSV file in b...

open_in_new Spark + PySpark

local_offer python local_offer spark local_offer pyspark

visibility 19190
thumb_up 0
access_time 2 years ago

In Spark, SparkContext.parallelize function can be used to convert Python list to RDD and then RDD can be converted to DataFrame object. The following sample code is based on Spark 2.x. In this page, I am going to show you how to convert the following list to a data frame: data = [(...

open_in_new Spark + PySpark

local_offer python local_offer spark local_offer pyspark local_offer hive

visibility 18557
thumb_up 3
access_time 2 years ago

From Spark 2.0, you can easily read data from Hive data warehouse and also write/append new data to Hive tables. This page shows how to operate with Hive in Spark including: Create DataFrame from existing Hive table Save DataFrame to a new Hive table Append data ...

open_in_new Spark + PySpark

local_offer python local_offer spark

visibility 23394
thumb_up 0
access_time 2 years ago

This post shows how to derive new column in a Spark data frame from a JSON array string column. I am running the code in Spark 2.2.1 though it is compatible with Spark 1.6.0 (with less JSON SQL functions). Prerequisites Refer to the following post to install Spark in Windows. ...

open_in_new Spark + PySpark

Install Hadoop 3.2.1 on Windows 10 Step by Step Guide

local_offer windows10 local_offer hadoop local_offer yarn

visibility 8545
thumb_up 12
access_time 7 months ago

This detailed step-by-step guide shows you how to install the latest Hadoop (v3.2.1) on Windows 10. It also provides a temporary fix for bug HDFS-14084 (java.lang.UnsupportedOperationException INFO).

open_in_new Hadoop

local_offer sqlite local_offer .net core local_offer entity-framework

visibility 24964
thumb_up 2
access_time 3 years ago

SQLite is a self-contained and embedded SQL database engine. In .NET Core, Entity Framework Core provides APIs to work with SQLite. This page provides sample code to create a SQLite database using package Microsoft.EntityFrameworkCore.Sqlite . Create sample project ...

open_in_new .NET Framework

local_offer SQL Server local_offer python local_offer spark local_offer pyspark

visibility 17478
thumb_up 2
access_time 2 years ago

Spark is an analytics engine for big data processing. There are various ways to connect to a database in Spark. This page summarizes some of common approaches to connect to SQL Server using Python as programming language. ...

open_in_new Spark + PySpark

local_offer hadoop local_offer hive

visibility 20895
thumb_up 5
access_time 2 years ago

In this article, I’m going to demo how to install Hive 3.0.0 on Windows 10. Prerequisites Before installation of Apache Hive, please ensure you have Hadoop available on your Windows environment. We cannot run Hive without Hadoop.  ...

open_in_new Hadoop

local_offer asp.net core local_offer identity core 2

visibility 21871
thumb_up 0
access_time 3 years ago

The identity system in ASP.NET has evolved over time. If you are using ASP.NET Core, you probably found User property is an instance of ClaimsPrincipal in Controller or Razor views. Thus to retrieve the information, you need to utilize the claims.

open_in_new ASP.NET Core

local_offer linux local_offer WSL local_offer ubuntu

visibility 7146
thumb_up 4
access_time 2 years ago

This page shows how to install Windows Subsystem for Linux (WSL) system on a non-system drive manually. Enable Windows Subsystem for Linux system feature Open PowerShell as Administrator and run the following command to enable WSL feature: Enable-WindowsOptionalFea...

open_in_new Tools

local_offer hadoop local_offer linux local_offer WSL

visibility 15483
thumb_up 8
access_time 2 years ago

In my previous post , I showed how to configure a single node Hadoop instance on Windows 10. The steps are not too difficult to follow if you have Java programming backgr...

open_in_new Hadoop

local_offer pyspark local_offer spark-2-x local_offer teradata local_offer SQL Server

visibility 2363
thumb_up 0
access_time 5 months ago

In my previous article about  Connect to SQL Server in Spark (PySpark) , I mentioned the ways t...

open_in_new Spark + PySpark

local_offer powershell

visibility 10942
thumb_up 1
access_time 3 years ago

PowerShell provides a number of cmdlets to retrieve current date time and to create time span object. Calculate time difference - CmdLets $current = Get-Date $end= Get-Date $diff= New-TimeSpan -Start $current -End $end Write-Output "Time difference is: $di...

open_in_new Scripting

local_offer pyspark local_offer spark-2-x local_offer spark local_offer python

visibility 3555
thumb_up 0
access_time 8 months ago

This article shows how to convert a Python dictionary list to a DataFrame in Spark using Python. Example dictionary list data = [{"Category": 'Category A', "ID": 1, "Value": 12.40}, {"Category": 'Category B', "ID": 2, "Value": 30.10}, {"Category": 'Category C', "...

open_in_new Spark + PySpark

Featured sites

Apache Spark installation guides, performance tuning tips, general tutorials, etc.

* Spark logo is a registered trademark of Apache Spark. 

Code snippets for various programming languages/frameworks.

Articles about Apache Hadoop installation, performance tuning and general tutorials.

*The yellow elephant logo is a registered trademark of Apache Hadoop。

Tutorials and information about Teradata.

Articles about ASP.NET Core 1.x, 2.x and 3.x.

Everything about .NET framework.

PowerShell, Bash, ksh, sh, Perl and etc.