By using this site, you acknowledge that you have read and understand our Cookie policy, Privacy policy and Terms .

Columns rss_feed

local_offer hadoop local_offer yarn local_offer hdfs

visibility 26535
comment 30
thumb_up 2
access_time 2 years ago

This page summarizes the steps to install Hadoop 3.0.0 on your Windows environment. Reference page: https://wiki.apache.org/hadoop/Hadoop2OnWindows ...

open_in_new View

local_offer hadoop local_offer linux local_offer WSL

visibility 9082
comment 18
thumb_up 5
access_time 9 months ago

In my previous post , I showed how to configure a single node Hadoop instance on Windows 10. The steps are not too difficult to follow if you have Java programming backgr...

open_in_new View

local_offer spark local_offer linux local_offer WSL

visibility 2599
comment 4
thumb_up 0
access_time 9 months ago

This pages summarizes the steps to install the latest version 2.4.3 of Apache Spark on Windows 10 via Windows Subsystem for Linux (WSL). Prerequisites Follow either of the following pages to install WSL in a system or non-system drive on your Windows 10. ...

open_in_new View

local_offer .net core local_offer entity-framework

visibility 13973
comment 4
thumb_up 0
access_time 2 years ago

SQLite is a self-contained and embedded SQL database engine. In .NET Core, Entity Framework Core provides APIs to work with SQLite. This page provides sample code to create a SQLite database using package Microsoft.EntityFrameworkCore.Sqlite . Create sample project ...

open_in_new View

local_offer python local_offer spark local_offer pyspark local_offer hive

visibility 8174
comment 0
thumb_up 0
access_time 10 months ago

From Spark 2.0, you can easily read data from Hive data warehouse and also write/append new data to Hive tables. This page shows how to operate with Hive in Spark including: Create DataFrame from existing Hive table Save DataFrame to a new Hive table Append data ...

open_in_new View

local_offer python local_offer spark local_offer pyspark

visibility 5448
comment 0
thumb_up 0
access_time 7 months ago

In Spark, SparkContext.parallelize function can be used to convert Python list to RDD and then RDD can be converted to DataFrame object. The following sample code is based on Spark 2.x. In this page, I am going to show you how to convert the following list to a data frame: data = [(...

open_in_new View

local_offer hadoop local_offer yarn local_offer hdfs

visibility 7704
comment 0
thumb_up 0
access_time 2 years ago

This page summarizes the default ports used by Hadoop services. It is useful when configuring network interfaces in a cluster. Hadoop 3.1.0 HDFS The secondary namenode http/https server address and port. ...

open_in_new View

local_offer lite-log local_offer linux local_offer WSL local_offer ubuntu

visibility 1372
comment 0
thumb_up 0
access_time 9 months ago

This page shows how to install Windows Subsystem for Linux (WSL) system on a non-system drive manually. Enable Windows Subsystem for Linux system feature Open PowerShell as Administrator and run the following command to enable WSL feature: Enable-WindowsOptionalFea...

open_in_new View

Latest Hadoop 3.2.1 Installation on Windows 10 Step by Step Guide

local_offer hadoop local_offer yarn

visibility 4
comment 0
thumb_up 0
access_time 3 hours ago

This detailed step-by-step guide shows you how to install the latest Hadoop (v3.2.1) on Windows 10. It also provides a temporary fix for bug HDFS-14084 (java.lang.UnsupportedOperationException INFO).

open_in_new View

Latest Hadoop 3.2.1 Installation on Windows 10 Step by Step Guide

local_offer hadoop local_offer yarn

visibility 4
comment 0
thumb_up 0
access_time 3 hours ago

This detailed step-by-step guide shows you how to install the latest Hadoop (v3.2.1) on Windows 10. It also provides a temporary fix for bug HDFS-14084 (java.lang.UnsupportedOperationException INFO).

open_in_new View

Kontext release v0.6.7

local_offer kontext

visibility 14
comment 0
thumb_up 0
access_time 7 days ago

Kontext v0.6.7 is now released with a few changes/enhancements. Changes The following sections list the new features/changes i...

open_in_new View

Kontext release v0.6.6

local_offer kontext

visibility 11
comment 0
thumb_up 0
access_time 14 days ago

Kontext v0.6.6 is now released with a few changes/enhancements. Changes SEO enhancements Added a number of Facebook and twitter m...

open_in_new View

Machine Learning with .NET in Jupyter Notebooks

local_offer machine-learning local_offer jupyter-notebook local_offer C# local_offer dotnet core

visibility 103
comment 0
thumb_up 0
access_time 16 days ago

In this article, I'm going to show you how to install Jupyter in Windows and then install .NET kernel for Jupyter notebooks. It also shows a machine learning example using ML.NET. The target audience are .NET developers who want to expand their skills in data engineering and science domain...

open_in_new View

Kontext release v0.6.5

local_offer kontext

visibility 13
comment 0
thumb_up 0
access_time 18 days ago

Kontext v0.6.5 is now released with a few changes/enhancements. Changes RSS changes RSS subscriptions are now increased to 200 items and the d...

open_in_new View

local_offer pyspark local_offer spark-2-x local_offer python

visibility 33
comment 0
thumb_up 0
access_time 18 days ago

This articles show you how to convert a Python dictionary list to a Spark DataFrame. The code snippets runs on Spark 2.x environments. Input The input data (dictionary list looks like the following): data = [{"Category": 'Category A', 'ItemID': 1, 'Amount': 12.40}, ...

open_in_new View

Improve PySpark Performance using Pandas UDF with Apache Arrow

local_offer pyspark local_offer spark local_offer spark-2-x local_offer pandas

visibility 120
comment 0
thumb_up 2
access_time 20 days ago

Apache Arrow is an in-memory columnar data format that can be used in Spark to efficiently transfer data between JVM and Python processes. This currently is most beneficial to Python users that work with Pandas/NumPy data. In this article, ...

open_in_new View

local_offer pyspark local_offer spark-2-x local_offer spark

visibility 9
comment 0
thumb_up 0
access_time 23 days ago

This article shows you how to read and write XML files in Spark. Sample XML file Create a sample XML file named test.xml with the following content: <?xml version="1.0"?> <data> <record id="1"> <rid>1</rid> <nam...

open_in_new View

local_offer python local_offer pandas

visibility 8
comment 0
thumb_up 0
access_time 23 days ago

Pickle files are commonly used Python data related projects. This article shows how to create and load pickle files using Pandas.  Create pickle file import pandas as pd import numpy as np file_name="data/test.pkl" data = np.random.randn(1000, 2) # pd.set_option('displ...

open_in_new View

Featured columns

Apache Spark installation guides, performance tuning tips, general tutorials, etc.

open_in_new View

Articles about Apache Hadoop installation, performance tuning and general tutorials.

open_in_new View

Tutorials and informations about Teradata.

open_in_new View

PowerShell, Bash, ksh, sh, Perl and etc. 

open_in_new View

Posts about Apache Sqoop, a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.

open_in_new View

Code snippets for various programming languages/frameworks.

open_in_new View

All columns

ML.NET is an open source and cross-platform machine learning framework. With ML.NET, you can create custom ML models using C# or F# without having to leave the .NET ecosystem. This column publish articles about ML.NET.

open_in_new View

Code snippets for various programming languages/frameworks.

open_in_new View

AspNetCore.XmlRpc - a XML Remote Procedure Call library for ASP.NET Core.

open_in_new View

1-10 of 37