By using this site, you acknowledge that you have read and understand our Cookie policy, Privacy policy and Terms .

Columns rss_feed

local_offer python local_offer spark local_offer pyspark

visibility 4989
thumb_up 3
access_time 12 months ago

Data partitioning is critical to data processing performance especially for large volume of data processing in Spark. Partitions in Spark won’t span across nodes though one node can contains more than one partitions. When processing, Spark assigns one task for each partition and each worker threa...

open_in_new View

local_offer python local_offer spark local_offer pyspark

visibility 6597
thumb_up 0
access_time 8 months ago

In Spark, SparkContext.parallelize function can be used to convert Python list to RDD and then RDD can be converted to DataFrame object. The following sample code is based on Spark 2.x. In this page, I am going to show you how to convert the following list to a data frame: data = [(...

open_in_new View

local_offer python local_offer spark local_offer pyspark local_offer hive

visibility 9051
thumb_up 1
access_time 12 months ago

From Spark 2.0, you can easily read data from Hive data warehouse and also write/append new data to Hive tables. This page shows how to operate with Hive in Spark including: Create DataFrame from existing Hive table Save DataFrame to a new Hive table Append data ...

open_in_new View

local_offer python local_offer spark

visibility 13259
thumb_up 0
access_time 2 years ago

This post shows how to derive new column in a Spark data frame from a JSON array string column. I am running the code in Spark 2.2.1 though it is compatible with Spark 1.6.0 (with less JSON SQL functions). Prerequisites Refer to the following post to install Spark in Windows. ...

open_in_new View

local_offer SQL Server local_offer python local_offer spark local_offer pyspark

visibility 8947
thumb_up 1
access_time 12 months ago

Spark is an analytics engine for big data processing. There are various ways to connect to a database in Spark. This page summarizes some of common approaches to connect to SQL Server using Python as programming language. ...

open_in_new View

local_offer hadoop local_offer hive

visibility 12150
thumb_up 2
access_time 12 months ago

If you have been following my website, you would know I’ve published a number of articles about installing big data tools/framewo...

open_in_new View

local_offer .net core local_offer entity-framework

visibility 14904
thumb_up 0
access_time 2 years ago

SQLite is a self-contained and embedded SQL database engine. In .NET Core, Entity Framework Core provides APIs to work with SQLite. This page provides sample code to create a SQLite database using package Microsoft.EntityFrameworkCore.Sqlite . Create sample project ...

open_in_new View

local_offer asp.net core local_offer identity core 2

visibility 13430
thumb_up 0
access_time 3 years ago

The identity system in ASP.NET has evolved over time. If you are using ASP.NET Core, you probably found User property is an instance of ClaimsPrincipal in Controller or Razor views. Thus to retrieve the information, you need to utilize the claims.

open_in_new View

local_offer hadoop local_offer yarn local_offer hdfs

visibility 27449
thumb_up 2
access_time 3 years ago

This page summarizes the steps to install Hadoop 3.0.0 on your Windows environment. Reference page: https://wiki.apache.org/hadoop/Hadoop2OnWindows ...

open_in_new View

local_offer hive local_offer hdfs

visibility 29
thumb_up 0
access_time 4 days ago

In Hive, there are two types of tables can be created - internal and external table. Internal tables are also called managed tables. Different features are available to different types. This article lists some of the common differences.  Internal table By default, Hive creates ...

open_in_new View

Spark Read from SQL Server Source using Windows/Kerberos Authentication

local_offer pyspark local_offer SQL Server local_offer spark-2-x

visibility 41
thumb_up 0
access_time 23 days ago

In this article, I am going to show you how to use JDBC Kerberos authentication to connect to SQL Server sources in Spark (PySpark). I will use  Kerberos connection with principal names and password directly that requires  ...

open_in_new View

Schema Merging (Evolution) with Parquet in Spark and Hive

local_offer parquet local_offer pyspark local_offer spark-2-x local_offer hive local_offer hdfs

visibility 65
thumb_up 0
access_time 24 days ago

Schema evolution is supported by many frameworks or data serialization systems such as Avro, Orc, Protocol Buffer and Parquet. With schema evolution, one set of data can be stored in multiple files with different but compatible schema. In Spark, Parquet data source can detect and merge schema ...

open_in_new View

Kontext release v0.x

local_offer kontext

visibility 18
thumb_up 0
access_time 28 days ago

Kontext release information before v.1.0.

open_in_new View

local_offer windows10 local_offer hadoop local_offer hdfs

visibility 84
thumb_up 0
access_time 2 months ago

Issue When installing Hadoop 3.2.1 on Windows 10,  you may encounter the following error when trying to format HDFS  namnode: ERROR namenode.NameNode: Failed to start namenode. The error happens when running the following comm...

open_in_new View

Compile and Build Hadoop 3.2.1 on Windows 10 Guide

local_offer windows10 local_offer hadoop

visibility 170
thumb_up 1
access_time 2 months ago

This article provides detailed steps about how to compile and build Hadoop (incl. native libs) on Windows 10. The following guide is based on Hadoop release 3.2.1. ...

open_in_new View

Install Latest Hadoop 3.2.1 on Windows 10 Step by Step Guide

local_offer windows10 local_offer hadoop local_offer yarn

visibility 332
thumb_up 1
access_time 2 months ago

This detailed step-by-step guide shows you how to install the latest Hadoop (v3.2.1) on Windows 10. It also provides a temporary fix for bug HDFS-14084 (java.lang.UnsupportedOperationException INFO).

open_in_new View

Kontext release v0.6.7

local_offer kontext

visibility 27
thumb_up 0
access_time 2 months ago

Kontext v0.6.7 is now released with a few changes/enhancements. Changes The following sections list the new features/changes in release v0.6.7. Multi-la...

open_in_new View

Kontext release v0.6.6

local_offer kontext

visibility 20
thumb_up 0
access_time 2 months ago

Kontext v0.6.6 is now released with a few changes/enhancements. Changes SEO enhancements Added a number of Facebook and twitter meta tags into head section of each page. Robots.txt is updated to make it simple. ...

open_in_new View

Featured columns

Apache Spark installation guides, performance tuning tips, general tutorials, etc.

open_in_new View

Articles about Apache Hadoop installation, performance tuning and general tutorials.

open_in_new View

Articles about ASP.NET Core 1.x, 2.x and 3.x.

open_in_new View

Code snippets for various programming languages/frameworks.

open_in_new View

Tutorials and informations about Teradata.

open_in_new View

PowerShell, Bash, ksh, sh, Perl and etc. 

open_in_new View

All columns

ML.NET is an open source and cross-platform machine learning framework. With ML.NET, you can create custom ML models using C# or F# without having to leave the .NET ecosystem. This column publish articles about ML.NET.

open_in_new View

Code snippets for various programming languages/frameworks.

open_in_new View

AspNetCore.XmlRpc - a XML Remote Procedure Call library for ASP.NET Core.

open_in_new View

Data analytics with Google Cloud Platform.

open_in_new View

Data analytics with Microsoft Azure cloud platform.

open_in_new View

Posts about Apache Sqoop, a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.

open_in_new View

Apache Spark installation guides, performance tuning tips, general tutorials, etc.

open_in_new View

1-10 of 32