By using this site, you acknowledge that you have read and understand our Cookie policy, Privacy policy and Terms .

Welcome to Kontext

Join our community for cloud, data and IT professionals.

Kontext Column

Created for everyone to publish data, programming and cloud related articles. Follow three steps to create your columns.


Learn more arrow_forward

Featured posts

local_offer python local_offer spark local_offer pyspark

visibility 7910
thumb_up 4
access_time 2 years ago

Data partitioning is critical to data processing performance especially for large volume of data processing in Spark. Partitions in Spark won’t span across nodes though one node can contains more than one partitions. When processing, Spark assigns one task for each partition and each worker threa...

open_in_new View

local_offer python local_offer spark local_offer pyspark

visibility 9133
thumb_up 0
access_time 9 months ago

In Spark, SparkContext.parallelize function can be used to convert Python list to RDD and then RDD can be converted to DataFrame object. The following sample code is based on Spark 2.x. In this page, I am going to show you how to convert the following list to a data frame: data = [(...

open_in_new View

local_offer python local_offer spark

visibility 15276
thumb_up 0
access_time 2 years ago

This post shows how to derive new column in a Spark data frame from a JSON array string column. I am running the code in Spark 2.2.1 though it is compatible with Spark 1.6.0 (with less JSON SQL functions). Prerequisites Refer to the following post to install Spark in Windows. ...

open_in_new View

local_offer SQL Server local_offer python local_offer spark local_offer pyspark

visibility 10733
thumb_up 1
access_time 2 years ago

Spark is an analytics engine for big data processing. There are various ways to connect to a database in Spark. This page summarizes some of common approaches to connect to SQL Server using Python as programming language. ...

open_in_new View

local_offer hadoop local_offer hive

visibility 14024
thumb_up 2
access_time 2 years ago

If you have been following my website, you would know I’ve published a number of articles about installing big data tools/framewo...

open_in_new View

local_offer python local_offer spark local_offer pyspark local_offer hive

visibility 10647
thumb_up 3
access_time 2 years ago

From Spark 2.0, you can easily read data from Hive data warehouse and also write/append new data to Hive tables. This page shows how to operate with Hive in Spark including: Create DataFrame from existing Hive table Save DataFrame to a new Hive table Append data ...

open_in_new View

Install Hadoop 3.0.0 on Windows (Single Node)

local_offer hadoop local_offer yarn local_offer hdfs

visibility 29053
thumb_up 2
access_time 3 years ago

This page summarizes the steps to install Hadoop 3.0.0 on your Windows environment. Reference page: https://wiki.apache.org/hadoop/Hadoop2OnWindows ...

open_in_new View

local_offer spark local_offer scala local_offer parquet

visibility 15566
thumb_up 0
access_time 3 years ago

In this page, I’m going to demonstrate how to write and read parquet files in Spark/Scala by using Spark SQLContext class. Reference What is parquet format? Go the following project site to understand more about parquet. ...

open_in_new View

local_offer hadoop local_offer linux local_offer WSL

visibility 11092
thumb_up 6
access_time 11 months ago

In my previous post , I showed how to configure a single node Hadoop instance on Windows 10. The steps are not too difficult to follow if you have Java programming backgr...

open_in_new View

local_offer .net core local_offer entity-framework

visibility 16520
thumb_up 1
access_time 2 years ago

SQLite is a self-contained and embedded SQL database engine. In .NET Core, Entity Framework Core provides APIs to work with SQLite. This page provides sample code to create a SQLite database using package Microsoft.EntityFrameworkCore.Sqlite . Create sample project ...

open_in_new View

local_offer asp.net core local_offer identity core 2

visibility 14961
thumb_up 0
access_time 3 years ago

The identity system in ASP.NET has evolved over time. If you are using ASP.NET Core, you probably found User property is an instance of ClaimsPrincipal in Controller or Razor views. Thus to retrieve the information, you need to utilize the claims.

open_in_new View

local_offer lite-log local_offer powershell

visibility 7572
thumb_up 1
access_time 3 years ago

PowerShell provides a number of cmdlets to retrieve current date time and to create time span object. Calculate time difference - CmdLets $current = Get-Date $end= Get-Date $diff= New-TimeSpan -Start $current -End $end Write-Output "Time difference is: $di...

open_in_new View

local_offer hadoop local_offer yarn local_offer hdfs

visibility 8825
thumb_up 0
access_time 2 years ago

This page summarizes the default ports used by Hadoop services. It is useful when configuring network interfaces in a cluster. Hadoop 3.1.0 HDFS The secondary namenode http/https server address and port. ...

open_in_new View

local_offer python local_offer pyspark local_offer pandas

visibility 2386
thumb_up 0
access_time 8 months ago

In Spark, it’s easy to convert Spark Dataframe to Pandas dataframe through one line of code: df_pd = df.toPandas() In this page, I am going to show you how to convert a list of PySpark row objects to a Pandas data frame. Prepare the data frame The fo...

open_in_new View

local_offer lite-log local_offer spark local_offer hdfs local_offer scala local_offer parquet

visibility 9715
thumb_up 0
access_time 3 years ago

In my previous post, I demonstrated how to write and read parquet files in Spark/Scala. The parquet file destination is a local folder. Write and Read Parquet Files in Spark/Scala In this page...

open_in_new View

Featured sites

Apache Spark installation guides, performance tuning tips, general tutorials, etc.

open_in_new View

Articles about Apache Hadoop installation, performance tuning and general tutorials.

open_in_new View

Code snippets for various programming languages/frameworks.

open_in_new View

Tutorials and information about Teradata.

open_in_new View

Articles about ASP.NET Core 1.x, 2.x and 3.x.

open_in_new View

PowerShell, Bash, ksh, sh, Perl and etc. 

open_in_new View