Kontext Column

Created for everyone to publish data, programming and cloud related articles.
Follow three steps to create your columns.


Learn more arrow_forward

local_offer python local_offer spark local_offer pyspark local_offer spark-advanced

visibility 44898
thumb_up 11
access_time 2 years ago

Data partitioning is critical to data processing performance especially for large volume of data processing in Spark. Partitions in Spark won’t span across nodes though one node can contains more than one partitions. When processing, Spark assigns one task for each partition and each worker threads ...

Install Hadoop 3.2.1 on Windows 10 Step by Step Guide

local_offer windows10 local_offer hadoop local_offer yarn local_offer big-data-on-windows-10

visibility 20187
thumb_up 18
access_time 11 months ago

This detailed step-by-step guide shows you how to install the latest Hadoop (v3.2.1) on Windows 10. It also provides a temporary fix for bug HDFS-14084 (java.lang.UnsupportedOperationException INFO).

local_offer spark local_offer pyspark local_offer how-to local_offer tutorial local_offer spark-dataframe

visibility 4708
thumb_up 1
access_time 4 months ago

This article shows you how to filter NULL/None values from a Spark data frame using Python. Function DataFrame.filter or DataFrame.where can be used to filter out null values.

Pandas DataFrame Plot - Pie Chart

local_offer plot local_offer pandas local_offer jupyter-notebook local_offer python local_offer pandas-plot

visibility 6491
thumb_up 0
access_time 8 months ago

This article provides examples about plotting pie chart using  pandas.DataFrame.plot  function. The data I'm going to use is the same as the other article  Pandas DataFrame Plot - Bar Chart . I'm also using Jupyter Notebook to plot them. The DataFrame has 9 records: DATE TYPE ...

local_offer python local_offer spark local_offer pyspark local_offer spark-dataframe

visibility 29670
thumb_up 0
access_time 2 years ago

In Spark, SparkContext.parallelize function can be used to convert Python list to RDD and then RDD can be converted to DataFrame object. The following sample code is based on Spark 2.x. In this page, I am going to show you how to convert the following list to a data frame: data = [('Category A' ...

local_offer SQL Server local_offer python local_offer spark local_offer pyspark local_offer spark-database-connect

visibility 23997
thumb_up 4
access_time 2 years ago

Spark is an analytics engine for big data processing. There are various ways to connect to a database in Spark. This page summarizes some of common approaches to connect to SQL Server using Python as programming language. For each method, both Windows Authentication and SQL Server ...

local_offer linux local_offer WSL local_offer ubuntu local_offer big-data-on-wsl

visibility 12908
thumb_up 6
access_time 2 years ago

This page shows how to install Windows Subsystem for Linux (WSL) system on a non-system drive manually. Open PowerShell as Administrator and run the following command to enable WSL feature: Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux Run the following ...

local_offer sqlite local_offer entity-framework local_offer dotnetcore

visibility 32491
thumb_up 2
access_time 3 years ago

SQLite is a self-contained and embedded SQL database engine. In .NET Core, Entity Framework Core provides APIs to work with SQLite. This page provides sample code to create a SQLite database using package Microsoft.EntityFrameworkCore.Sqlite . Create a .NET Core 2.x console application in ...

local_offer python local_offer spark local_offer spark-dataframe

visibility 31046
thumb_up 0
access_time 2 years ago

This post shows how to derive new column in a Spark data frame from a JSON array string column. I am running the code in Spark 2.2.1 though it is compatible with Spark 1.6.0 (with less JSON SQL functions). Refer to the following post to install Spark in Windows. Install Spark 2.2.1 in Windows ...

local_offer hadoop local_offer hive local_offer big-data-on-windows-10

visibility 25429
thumb_up 5
access_time 2 years ago

In this article, I’m going to demo how to install Hive 3.0.0 on Windows 10. Before installation of Apache Hive, please ensure you have Hadoop available on your Windows environment. We cannot run Hive without Hadoop.  I recommend to install Hadoop 3.x to work with Hive 3.0.0. There are two ...

local_offer pyspark local_offer spark-2-x local_offer spark local_offer python local_offer spark-dataframe

visibility 8940
thumb_up 1
access_time 12 months ago

This article shows how to convert a Python dictionary list to a DataFrame in Spark using Python. data = [{"Category": 'Category A', "ID": 1, "Value": 12.40}, {"Category": 'Category B', "ID": 2, "Value": 30.10}, {"Category": 'Category C', "ID": 3, "Value": 100.01} ] The ...

local_offer pyspark local_offer spark local_offer spark-2-x local_offer spark-file-operations

visibility 16180
thumb_up 0
access_time 13 months ago

Spark provides rich APIs to save data frames to many different formats of files such as CSV, Parquet, Orc, Avro, etc. CSV is commonly used in data application though nowadays binary formats are getting momentum. In this article, I am going to show you how to save Spark data frame as CSV file in ...

local_offer python local_offer spark local_offer pyspark local_offer hive local_offer spark-database-connect

visibility 26585
thumb_up 4
access_time 2 years ago

From Spark 2.0, you can easily read data from Hive data warehouse and also write/append new data to Hive tables. This page shows how to operate with Hive in Spark including: Create DataFrame from existing Hive table Save DataFrame to a new Hive table Append data to the existing Hive table via ...

local_offer pyspark local_offer spark-2-x local_offer teradata local_offer SQL Server local_offer spark-database-connect

visibility 7686
thumb_up 1
access_time 9 months ago

In my previous article about  Connect to SQL Server in Spark (PySpark) , I mentioned the ways to read data from SQL Server databases as dataframe using JDBC. We can also use JDBC to write data from Spark dataframe to database tables. In the following sections, I'm going to show you how to ...

Install Hadoop 3.0.0 on Windows (Single Node)

local_offer hadoop local_offer yarn local_offer hdfs local_offer big-data-on-windows-10

visibility 38011
thumb_up 3
access_time 3 years ago

This page summarizes the steps to install Hadoop 3.0.0 on your Windows environment. Reference page: https://wiki.apache.org/hadoop/Hadoop2OnWindows https://hadoop.apache.org/docs/r1.2.1/cluster_setup.html info A newer version of installation guide for latest Hadoop 3.2.1 is available. I ...

local_offer .NET local_offer kontext

visibility 14
thumb_up 0
access_time 16 days ago

Kontext platform is now upgraded to the latest .NET 5 from .NET 5 preview versions as at 2020-11-12. The upgrades include: ASP.NET Core Projects: from ASP.NET Core 5 preview extensions to .NET 5 built-in stack. Azure DevOps Build Pipeline Azure App Services Regards, Kontext Admin

local_offer .NET local_offer Azure

visibility 12
thumb_up 0
access_time 16 days ago

.NET 5.0 is officially released on 2020-11-10. Refer to this blog page for more details: Announcing .NET 5.0   Download Visual Studio 2019 version 16.8.0 with .NET 5.0 SDK integrated. Release notes Download Visual Studio  .NET SDK version can be changed: { "sdk": { ...

Run .NET 5 on Azure App Services

local_offer .NET local_offer Azure local_offer asp.net core

visibility 65
thumb_up 0
access_time 2 months ago

.NET 5 RC2 was released on 2020-10-13. On Azure, you can only select .NET Core 3.1 or 2.1 LTS versions as runtime stack when creating web app as the following screenshot shows.   This will stay as is till .NET 5 official release. However, you can use extensions to run your .NET 5 ...

local_offer spark local_offer SQL

visibility 149
thumb_up 0
access_time 2 months ago

In Spark, function to_date can be used to convert string to date. This function is available since Spark 1.5.0. SELECT to_date('2020-10-23', 'yyyy-MM-dd'); SELECT to_date('23Oct2020', 'ddMMMyyyy'); Refer to the official documentation about all the datetime patterns.  ...

.NET for Apache Spark v1.0.0 Released

local_offer .NET local_offer spark

visibility 12
thumb_up 0
access_time 2 months ago

.NET for Apache Spark v1.0.0 was released officially on 2020-10-14. This page summarizes some important resources for you to get started on .NET for Spark. *Image credit: https://github.com/dotnet/spark/raw/master/docs/img/dotnetsparklogo-6.png Release Notes on GitHub ...

local_offer spark

visibility 89
thumb_up 0
access_time 2 months ago

Recently, one of my colleague asked me one question about Spark: for the same SQL statement on finding max value of partition column, different values are returned in Spark SQL and Hive/Impala SQL. The SQL statement looks like the following: SELECT MAX(PART_COL) FROM HiveDb.TestSQL; ...

local_offer sqoop local_offer hive local_offer partitioning

visibility 27
thumb_up 2
access_time 3 months ago

Sqoop +Hive+ HCatalog +Multilvel partitioning

local_offer .NET local_offer C#

visibility 59
thumb_up 0
access_time 2 months ago

In many solutions, .NET standard has been used to share code between .NET Framework and .NET Core projects. Since the release of .NET 5, you probably will think what is the .NET standard version for C# 9 language features. Well, the answer is simple - there is no need to have another version of ...

Get Started on Reunified .NET 5

local_offer .NET

visibility 19
thumb_up 0
access_time 3 months ago

In May 2019, Microsoft announced the roadmap for .NET in Build conference. .NET 5 is the update that unifies divergent frameworks, reduces code complexity and supports cross-platform reach including desktop, Web, mobile, cloud and device platforms. On 13th September 2020, Microsoft announced .NET ...

C# 9.0 New Features

local_offer C# local_offer .NET

visibility 296
thumb_up 0
access_time 3 months ago

.NET 5.0 release candidate 1 (rc.1) was published on 2020-09-14, which marks another big step towards the official .NET 5.0 release. As part of 5.0, C# 9.0 will be released with a bunch of new features. This article summarizes some of the new features with examples. Download .NET 5.0 SDK from this ...

Introduction to C# Interactive

local_offer C# local_offer .NET

visibility 238
thumb_up 0
access_time 3 months ago

Python, R and many other scripting languages generally support interactive programming features in their IDEs. When C# was created initially, all C# written programs need to be complied into MSIL first before it can run in .NET runtime environments (unless the code is dynamically complied).  ...

local_offer asp.net core local_offer asp.net core 3 local_offer C#

visibility 845
thumb_up 0
access_time 3 months ago

This page summarize information about how to retrieve client and server IP address in ASP.NET core applications.  Client IP address can be retrieved via HttpContext.Connection object. This properties exist in both Razor page model and ASP.NET MVC controller. Property  RemoteIpAddress ...

Statistics with R (Part II)

local_offer r-lang

visibility 17
thumb_up 0
access_time 3 months ago

In article Statistics with R (Part I) , we walked-through the basic statistics calculation using R and also regression models incl. linear regression, multiple regression, logistic regression and Poisson regression. In this part, we will continue to explore more complicated analysis including ...

Statistics with R (Part I)

local_offer r-lang

visibility 11
thumb_up 0
access_time 3 months ago

Till now, we've gone through R programming basics, data types, packages and IDEs, data APIs to work with data sources and various plotting functions. Let's now dive into the most important part about statistics and modelling with R. After all, R was created for statistics.  warning  Due ...

Plotting with R (Part II)

local_offer plot local_offer r-lang

visibility 19
thumb_up 0
access_time 3 months ago

In Plotting with R (Part I) , I summarized the functions that can be used in R plotting. In this part, we continue the journey to plot more rich and complex charts like Pie Chart, Bar Chart, BoxPlot, Histogram, Line and Scatterplot using those functions.  Pie chart can be drawn using ...

All columns

Programming with R language - tutorials about R. 

Streaming analytics related tutorials and ideas.


I came across issue while running Sqoop import to a partitioned table, and found workaround for same, sharing my two cents..

 

Code snippets and tips for various programming languages/frameworks.

AspNetCore.XmlRpc - a XML Remote Procedure Call library for ASP.NET Core.

Data analytics with Google Cloud Platform.

Data analytics, application development with Microsoft Azure cloud platform.