Kontext Column

Created for everyone to publish data, programming and cloud related articles.
Follow three steps to create your columns.


Learn more arrow_forward

Featured articles

local_offer python local_offer spark local_offer pyspark local_offer spark-advanced

visibility 44774
thumb_up 11
access_time 2 years ago

Data partitioning is critical to data processing performance especially for large volume of data processing in Spark. Partitions in Spark won’t span across nodes though one node can contains more than one partitions. When processing, Spark assigns one task for each partition and each worker threads ...

Install Hadoop 3.2.1 on Windows 10 Step by Step Guide

local_offer windows10 local_offer hadoop local_offer yarn local_offer big-data-on-windows-10

visibility 20083
thumb_up 18
access_time 11 months ago

This detailed step-by-step guide shows you how to install the latest Hadoop (v3.2.1) on Windows 10. It also provides a temporary fix for bug HDFS-14084 (java.lang.UnsupportedOperationException INFO).

local_offer pyspark local_offer spark local_offer spark-2-x local_offer spark-file-operations

visibility 16131
thumb_up 0
access_time 13 months ago

Spark provides rich APIs to save data frames to many different formats of files such as CSV, Parquet, Orc, Avro, etc. CSV is commonly used in data application though nowadays binary formats are getting momentum. In this article, I am going to show you how to save Spark data frame as CSV file in ...

local_offer spark local_offer pyspark local_offer how-to local_offer tutorial local_offer spark-dataframe

visibility 4640
thumb_up 1
access_time 4 months ago

This article shows you how to filter NULL/None values from a Spark data frame using Python. Function DataFrame.filter or DataFrame.where can be used to filter out null values.

local_offer python local_offer spark local_offer pyspark local_offer spark-dataframe

visibility 29610
thumb_up 0
access_time 2 years ago

In Spark, SparkContext.parallelize function can be used to convert Python list to RDD and then RDD can be converted to DataFrame object. The following sample code is based on Spark 2.x. In this page, I am going to show you how to convert the following list to a data frame: data = [('Category A' ...

local_offer sqlite local_offer entity-framework local_offer dotnetcore

visibility 32434
thumb_up 2
access_time 3 years ago

SQLite is a self-contained and embedded SQL database engine. In .NET Core, Entity Framework Core provides APIs to work with SQLite. This page provides sample code to create a SQLite database using package Microsoft.EntityFrameworkCore.Sqlite . Create a .NET Core 2.x console application in ...

local_offer python local_offer spark local_offer pyspark local_offer hive local_offer spark-database-connect

visibility 26542
thumb_up 4
access_time 2 years ago

From Spark 2.0, you can easily read data from Hive data warehouse and also write/append new data to Hive tables. This page shows how to operate with Hive in Spark including: Create DataFrame from existing Hive table Save DataFrame to a new Hive table Append data to the existing Hive table via ...

Pandas DataFrame Plot - Pie Chart

local_offer plot local_offer pandas local_offer jupyter-notebook local_offer python local_offer pandas-plot

visibility 6433
thumb_up 0
access_time 8 months ago

This article provides examples about plotting pie chart using  pandas.DataFrame.plot  function. The data I'm going to use is the same as the other article  Pandas DataFrame Plot - Bar Chart . I'm also using Jupyter Notebook to plot them. The DataFrame has 9 records: DATE TYPE ...

local_offer SQL Server local_offer python local_offer spark local_offer pyspark local_offer spark-database-connect

visibility 23942
thumb_up 4
access_time 2 years ago

Spark is an analytics engine for big data processing. There are various ways to connect to a database in Spark. This page summarizes some of common approaches to connect to SQL Server using Python as programming language. For each method, both Windows Authentication and SQL Server ...

local_offer python local_offer spark local_offer spark-dataframe

visibility 30996
thumb_up 0
access_time 2 years ago

This post shows how to derive new column in a Spark data frame from a JSON array string column. I am running the code in Spark 2.2.1 though it is compatible with Spark 1.6.0 (with less JSON SQL functions). Refer to the following post to install Spark in Windows. Install Spark 2.2.1 in Windows ...

local_offer pyspark local_offer spark-2-x local_offer spark local_offer python local_offer spark-dataframe

visibility 8894
thumb_up 1
access_time 12 months ago

This article shows how to convert a Python dictionary list to a DataFrame in Spark using Python. data = [{"Category": 'Category A', "ID": 1, "Value": 12.40}, {"Category": 'Category B', "ID": 2, "Value": 30.10}, {"Category": 'Category C', "ID": 3, "Value": 100.01} ] The ...

Install Hadoop 3.3.0 on Windows 10 Step by Step Guide

local_offer windows10 local_offer hadoop local_offer yarn local_offer hdfs local_offer big-data-on-windows-10

visibility 4212
thumb_up 5
access_time 4 months ago

This detailed step-by-step guide shows you how to install the latest Hadoop v3.3.0 on Windows 10. It leverages Hadoop 3.3.0 winutils tool and WSL is not required. This version was released on July 14 2020. It is the first release of Apache Hadoop 3.3 line. There are significant changes compared with Hadoop 3.2.0, such as Java 11 runtime support, protobuf upgrade to 3.7.1, scheduling of opportunistic containers, non-volatile SCM support in HDFS cache directives, etc.

local_offer pyspark local_offer spark-2-x local_offer teradata local_offer SQL Server local_offer spark-database-connect

visibility 7648
thumb_up 1
access_time 9 months ago

In my previous article about  Connect to SQL Server in Spark (PySpark) , I mentioned the ways to read data from SQL Server databases as dataframe using JDBC. We can also use JDBC to write data from Spark dataframe to database tables. In the following sections, I'm going to show you how to ...

local_offer hadoop local_offer linux local_offer WSL local_offer big-data-on-wsl

visibility 19830
thumb_up 11
access_time 2 years ago

In my previous post , I showed how to configure a single node Hadoop instance on Windows 10. The steps are not too difficult to follow if you have Java programming background. However there is one step that is not very straightforward: native Hadoop executable (winutils.exe) is not included in the ...

local_offer linux local_offer WSL local_offer ubuntu local_offer big-data-on-wsl

visibility 12850
thumb_up 6
access_time 2 years ago

This page shows how to install Windows Subsystem for Linux (WSL) system on a non-system drive manually. Open PowerShell as Administrator and run the following command to enable WSL feature: Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux Run the following ...

Featured sites

Apache Spark installation guides, performance tuning tips, general tutorials, etc.

*Spark logo is a registered trademark of Apache Spark.

Articles about Apache Hadoop installation, performance tuning and general tutorials.

*The yellow elephant logo is a registered trademark of Apache Hadoop.

Code snippets and tips for various programming languages/frameworks.

Articles about ASP.NET Core 1.x, 2.x and 3.x.

Everything about .NET framework, .NET Core and .NET Standard. 

Tutorials and information about Teradata.

Streaming analytics related tutorials and ideas.

Latest articles

local_offer .NET local_offer kontext

visibility 14
thumb_up 0
access_time 15 days ago

Kontext platform is now upgraded to the latest .NET 5 from .NET 5 preview versions as at 2020-11-12. The upgrades include: ASP.NET Core Projects: from ASP.NET Core 5 preview extensions to .NET 5 built-in stack. Azure DevOps Build Pipeline Azure App Services Regards, Kontext Admin

local_offer .NET local_offer Azure

visibility 12
thumb_up 0
access_time 15 days ago

.NET 5.0 is officially released on 2020-11-10. Refer to this blog page for more details: Announcing .NET 5.0   Download Visual Studio 2019 version 16.8.0 with .NET 5.0 SDK integrated. Release notes Download Visual Studio  .NET SDK version can be changed: { "sdk": { ...

Run .NET 5 on Azure App Services

local_offer .NET local_offer Azure local_offer asp.net core

visibility 62
thumb_up 0
access_time 2 months ago

.NET 5 RC2 was released on 2020-10-13. On Azure, you can only select .NET Core 3.1 or 2.1 LTS versions as runtime stack when creating web app as the following screenshot shows.   This will stay as is till .NET 5 official release. However, you can use extensions to run your .NET 5 ...

local_offer spark local_offer SQL

visibility 144
thumb_up 0
access_time 2 months ago

In Spark, function to_date can be used to convert string to date. This function is available since Spark 1.5.0. SELECT to_date('2020-10-23', 'yyyy-MM-dd'); SELECT to_date('23Oct2020', 'ddMMMyyyy'); Refer to the official documentation about all the datetime patterns.  ...

.NET for Apache Spark v1.0.0 Released

local_offer .NET local_offer spark

visibility 12
thumb_up 0
access_time 2 months ago

.NET for Apache Spark v1.0.0 was released officially on 2020-10-14. This page summarizes some important resources for you to get started on .NET for Spark. *Image credit: https://github.com/dotnet/spark/raw/master/docs/img/dotnetsparklogo-6.png Release Notes on GitHub ...

local_offer spark

visibility 88
thumb_up 0
access_time 2 months ago

Recently, one of my colleague asked me one question about Spark: for the same SQL statement on finding max value of partition column, different values are returned in Spark SQL and Hive/Impala SQL. The SQL statement looks like the following: SELECT MAX(PART_COL) FROM HiveDb.TestSQL; ...

local_offer sqoop local_offer hive local_offer partitioning

visibility 27
thumb_up 2
access_time 3 months ago

Sqoop +Hive+ HCatalog +Multilvel partitioning

local_offer .NET local_offer C#

visibility 56
thumb_up 0
access_time 3 months ago

In many solutions, .NET standard has been used to share code between .NET Framework and .NET Core projects. Since the release of .NET 5, you probably will think what is the .NET standard version for C# 9 language features. Well, the answer is simple - there is no need to have another version of ...

Get Started on Reunified .NET 5

local_offer .NET

visibility 18
thumb_up 0
access_time 3 months ago

In May 2019, Microsoft announced the roadmap for .NET in Build conference. .NET 5 is the update that unifies divergent frameworks, reduces code complexity and supports cross-platform reach including desktop, Web, mobile, cloud and device platforms. On 13th September 2020, Microsoft announced .NET ...

C# 9.0 New Features

local_offer C# local_offer .NET

visibility 292
thumb_up 0
access_time 3 months ago

.NET 5.0 release candidate 1 (rc.1) was published on 2020-09-14, which marks another big step towards the official .NET 5.0 release. As part of 5.0, C# 9.0 will be released with a bunch of new features. This article summarizes some of the new features with examples. Download .NET 5.0 SDK from this ...

Introduction to C# Interactive

local_offer C# local_offer .NET

visibility 238
thumb_up 0
access_time 3 months ago

Python, R and many other scripting languages generally support interactive programming features in their IDEs. When C# was created initially, all C# written programs need to be complied into MSIL first before it can run in .NET runtime environments (unless the code is dynamically complied).  ...

local_offer asp.net core local_offer asp.net core 3 local_offer C#

visibility 815
thumb_up 0
access_time 3 months ago

This page summarize information about how to retrieve client and server IP address in ASP.NET core applications.  Client IP address can be retrieved via HttpContext.Connection object. This properties exist in both Razor page model and ASP.NET MVC controller. Property  RemoteIpAddress ...

Statistics with R (Part II)

local_offer r-lang

visibility 17
thumb_up 0
access_time 3 months ago

In article Statistics with R (Part I) , we walked-through the basic statistics calculation using R and also regression models incl. linear regression, multiple regression, logistic regression and Poisson regression. In this part, we will continue to explore more complicated analysis including ...

Statistics with R (Part I)

local_offer r-lang

visibility 11
thumb_up 0
access_time 3 months ago

Till now, we've gone through R programming basics, data types, packages and IDEs, data APIs to work with data sources and various plotting functions. Let's now dive into the most important part about statistics and modelling with R. After all, R was created for statistics.  warning  Due ...

Plotting with R (Part II)

local_offer plot local_offer r-lang

visibility 19
thumb_up 0
access_time 3 months ago

In Plotting with R (Part I) , I summarized the functions that can be used in R plotting. In this part, we continue the journey to plot more rich and complex charts like Pie Chart, Bar Chart, BoxPlot, Histogram, Line and Scatterplot using those functions.  Pie chart can be drawn using ...