Kontext Column

Created for everyone to publish data, programming and cloud related articles.
Follow three steps to create your columns.


Learn more arrow_forward

Featured articles

local_offer python local_offer spark local_offer pyspark local_offer spark-advanced

visibility 31548
thumb_up 9
access_time 2 years ago

Data partitioning is critical to data processing performance especially for large volume of data processing in Spark. Partitions in Spark won’t span across nodes though one node can contains more than one partitions. When processing, Spark assigns one task for each partition and each worker threa...

Install Hadoop 3.2.1 on Windows 10 Step by Step Guide

local_offer windows10 local_offer hadoop local_offer yarn local_offer big-data-on-windows-10

visibility 12905
thumb_up 13
access_time 8 months ago

This detailed step-by-step guide shows you how to install the latest Hadoop (v3.2.1) on Windows 10. It also provides a temporary fix for bug HDFS-14084 (java.lang.UnsupportedOperationException INFO).

local_offer python local_offer spark local_offer pyspark local_offer spark-dataframe

visibility 23132
thumb_up 0
access_time 2 years ago

In Spark, SparkContext.parallelize function can be used to convert Python list to RDD and then RDD can be converted to DataFrame object. The following sample code is based on Spark 2.x. In this page, I am going to show you how to convert the following list to a data frame: data = [(...

local_offer pyspark local_offer spark local_offer spark-2-x local_offer spark-file-operations

visibility 10607
thumb_up 0
access_time 10 months ago

Spark provides rich APIs to save data frames to many different formats of files such as CSV, Parquet, Orc, Avro, etc. CSV is commonly used in data application though nowadays binary formats are getting momentum. In this article, I am going to show you how to save Spark data frame as CSV file in b...

local_offer python local_offer spark local_offer pyspark local_offer hive local_offer spark-database-connect

visibility 21870
thumb_up 4
access_time 2 years ago

From Spark 2.0, you can easily read data from Hive data warehouse and also write/append new data to Hive tables. This page shows how to operate with Hive in Spark including: Create DataFrame from existing Hive table Save DataFrame to a new Hive table Append data ...

local_offer python local_offer spark local_offer spark-dataframe

visibility 26376
thumb_up 0
access_time 2 years ago

This post shows how to derive new column in a Spark data frame from a JSON array string column. I am running the code in Spark 2.2.1 though it is compatible with Spark 1.6.0 (with less JSON SQL functions). Prerequisites Refer to the following post to install Spark in Windows. ...

local_offer pyspark local_offer spark-2-x local_offer teradata local_offer SQL Server local_offer spark-database-connect

visibility 4396
thumb_up 0
access_time 7 months ago

In my previous article about  Connect to SQL Server in Spark (PySpark) , I mentioned the ways t...

local_offer SQL Server local_offer python local_offer spark local_offer pyspark local_offer spark-database-connect

visibility 20067
thumb_up 4
access_time 2 years ago

Spark is an analytics engine for big data processing. There are various ways to connect to a database in Spark. This page summarizes some of common approaches to connect to SQL Server using Python as programming language. ...

local_offer asp.net core local_offer identity core 2

visibility 24256
thumb_up 0
access_time 3 years ago

The identity system in ASP.NET has evolved over time. If you are using ASP.NET Core, you probably found User property is an instance of ClaimsPrincipal in Controller or Razor views. Thus to retrieve the information, you need to utilize the claims.

local_offer sqlite local_offer .net core local_offer entity-framework

visibility 28098
thumb_up 2
access_time 3 years ago

SQLite is a self-contained and embedded SQL database engine. In .NET Core, Entity Framework Core provides APIs to work with SQLite. This page provides sample code to create a SQLite database using package Microsoft.EntityFrameworkCore.Sqlite . Create sample project ...

local_offer linux local_offer WSL local_offer ubuntu local_offer big-data-on-wsl

visibility 9324
thumb_up 4
access_time 2 years ago

This page shows how to install Windows Subsystem for Linux (WSL) system on a non-system drive manually. Enable Windows Subsystem for Linux system feature Open PowerShell as Administrator and run the following command to enable WSL feature: Enable-WindowsOptionalFea...

local_offer hadoop local_offer hive local_offer big-data-on-windows-10

visibility 22852
thumb_up 5
access_time 2 years ago

In this article, I’m going to demo how to install Hive 3.0.0 on Windows 10. Prerequisites Before installation of Apache Hive, please ensure you have Hadoop available on your Windows environment. We cannot run Hive without Hadoop.  ...

local_offer pyspark local_offer spark-2-x local_offer spark local_offer python local_offer spark-dataframe

visibility 5150
thumb_up 0
access_time 10 months ago

This article shows how to convert a Python dictionary list to a DataFrame in Spark using Python. Example dictionary list data = [{"Category": 'Category A', "ID": 1, "Value": 12.40}, {"Category": 'Category B', "ID": 2, "Value": 30.10}, {"Category": 'Category C', "...

Install Hadoop 3.0.0 on Windows (Single Node)

local_offer hadoop local_offer yarn local_offer hdfs local_offer big-data-on-windows-10

visibility 35571
thumb_up 3
access_time 3 years ago

This page summarizes the steps to install Hadoop 3.0.0 on your Windows environment. Reference page: https://wiki.apache.org/hadoop/Hadoop2OnWindows ...

local_offer hadoop local_offer linux local_offer WSL local_offer big-data-on-wsl

visibility 17330
thumb_up 9
access_time 2 years ago

In my previous post , I showed how to configure a single node Hadoop instance on Windows 10. The steps are not too difficult to follow if you have Java programming backgr...

Featured sites

Apache Spark installation guides, performance tuning tips, general tutorials, etc.

*Spark logo is a registered trademark of Apache Spark.

Articles about Apache Hadoop installation, performance tuning and general tutorials.

*The yellow elephant logo is a registered trademark of Apache Hadoop.

Code snippets for various programming languages/frameworks.

Articles about ASP.NET Core 1.x, 2.x and 3.x.

Tutorials and information about Teradata.

Latest articles

local_offer .NET local_offer .net core local_offer C#

visibility 6
thumb_up 0
access_time 2 hours ago

Language-Integrated Query (LINQ) is a set of technologies based on the integration of query capabilities directly into the C# or VB language in .NET. It allows intuitive query against SQL databases, XML, object list, etc.  This article shows how to return a top N records randomly. ...

local_offer r-lang

visibility 7
thumb_up 0
access_time 23 hours ago

R is an implementation of the S programming language (Bell Labs). It was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand. R is named partly after the first names of the first two R authors and partly as a play on the name of S. It is currently developed by th...

local_offer SQL Server local_offer t-sql

visibility 3
thumb_up 0
access_time 1 day ago

SQL Server has built-in function HASHBYTES that can be used to calculate hash values. The supported hash algorithms include MD2, MD4, MD5, SHA, SHA1, SHA2_256 and SHA2_512. Function HASHBYTES  The signature of this function is: HASHBYTES ( '<a...

local_offer teradata local_offer SQL

visibility 3
thumb_up 0
access_time 1 day ago

Teradata has no built-in MD5 function thus custom function needs to be implemented for calculating MD5. This article shows you how to do that using the MD5 message digest UDF provided on Teradata Downloads. Prerequisites Permission CREATE FUNCTION is required for creating UDF in Ter...

local_offer teradata local_offer SQL

visibility 7
thumb_up 0
access_time 1 day ago

A procedure contains a number of SQL statements that will be executed in sequence in Teradata. Procedures can be used to return result sets to the invoking clients. In many other databases, it is very easy to return set records to the client in a procedure as SELECT statement can be directly used...

local_offer teradata local_offer SQL

visibility 5
thumb_up 0
access_time 1 day ago

Current timestamp Function CURRENT_TIMESTAMP can be used to retrieve the current timestamp: SELECT CURRENT_TIMESTAMP; Sample output: 20/09/2020 20:55:35.390000-04:00 Convert TimeStamp to Date Function C...

local_offer teradata local_offer SQL

visibility 8
thumb_up 0
access_time 2 days ago

RANDOM function in Teradata returns a random integer number for each row of the results table. It is a Teradata extension to the ANSI SQL:2011 standard. Function syntax Random(lower_bound, upper_bound) The limits for&...

local_offer teradata local_offer SQL

visibility 9
thumb_up 0
access_time 2 days ago

This article demonstrates how to create volatile table in a Teradata procedure, perform DML actions (INSERT, DELETE, UPDATE) against it and then return the result set dynamically from the temporary table in the procedure.

Load Microsoft 365 SharePoint List Data in Python

local_offer Azure local_offer python

visibility 9
thumb_up 0
access_time 7 days ago

A Microsoft SharePoint list is a collection of data can be shared with team members or people who you give access to. It is commonly used to capture commonly maintained master data from manual inputs.  This article summarizes steps to create a SharePoint list...

local_offer teradata local_offer teradata-utilities

visibility 6
thumb_up 0
access_time 8 days ago

BTEQ is a Teradata utility tool that can be used to run Teradata SQL statements incl. DDL, DML, etc. It can also be used to import data from text file into Teradata databases. It also works with XML and JSON files too. Like TPT and FASTLOAD, it can run in both batch and interactive modes. T...

local_offer teradata local_offer teradata-utilities

visibility 12
thumb_up 0
access_time 8 days ago

BTEQ is a Teradata utility tool that can be used to run Teradata SQL statements incl. DDL, DML, etc. It can also be used to import data from text file into Teradata databases. Like TPT and FASTLOAD, it can run in both batch and interactive modes. This article demonstrates how to load XML fi...

local_offer teradata local_offer teradata-utilities

visibility 18
thumb_up 0
access_time 8 days ago

Teradata Parallel Transporter (TPT) provides rich functions to load data into Teradata and to export data. In article Load CSV into Teradata via TPT , it shows how to load CSV files into Teradata....

local_offer teradata local_offer teradata-utilities

visibility 8
thumb_up 0
access_time 8 days ago

Teradata FastExport is a command utility tool that can transfer large amount of data from Teradata database to a file.  One of the commonly used scenarios is to export data from a table or view to a text file and then load the export file into a different server.  ...

local_offer teradata local_offer fastload local_offer teradata-utilities

visibility 11
thumb_up 0
access_time 8 days ago

In article  Teradata FastLoad - Load CSV File , it shows how to load CSV into Teradata. The input file is very basic CSV file.  This article expands on t...

local_offer teradata local_offer fastload local_offer teradata-utilities

visibility 10
thumb_up 0
access_time 8 days ago

Teradata FastLoad can be used to load CSV/TSV or other delimited files into database. Refer to article  Teradata FastLoad - Load CSV File  for more details about how to load CSV into Ter...