This site uses cookies to deliver our services. By using this site, you acknowledge that you have read and understand our Cookie and Privacy policy. Your use of Kontext website is subject to this policy. Allow Cookies and Dismiss
python spark pyspark

Implement SCD Type 2 Full Merge via Spark Data Frames

72 views   0 comments last modified about 19 days ago

Overview For SQL developers that are familiar with SCD and merge statements, you may wonder how to implement the same in big data platforms, considering database or storages in Hadoop are not designed/optimised for record level updates and inserts. In this post, I’m going to demons...

View detail
lite-log hadoop sqoop

Password Security Solution for Sqoop

21 views   0 comments last modified about 2 months ago

In Sqoop, there are multiple approaches to pass in passwords for RDBMS. Options Option 1 - clear password through --password argument sqoop [subcommand] --username user --password pwd This is the weakest approach as password is exposed directly...

View detail
python spark

PySpark: Convert JSON String Column to Array of Object (StructType) in Data Frame

145 views   0 comments last modified about 2 months ago

This post shows how to derive new column in a Spark data frame from a JSON array string column. I am running the code in Spark 2.2.1 though it is compatible with Spark 1.6.0 (with less JSON SQL functions). Prerequisites Refer to the following post to install Spark in Windows. ...

View detail
angular lite-log

ng is not recognized as an internal or external command (Windows 10)

431 views   0 comments last modified about 4 months ago

Problem When you follow Angular CLI installation guide in Windows, you may encounter the following error: ng is not recognized as an internal or external command The resolutions are available in the following link: ...

View detail
java bigquery gcp dataflow gcs

Load CSV File from Google Cloud Storage to BigQuery Using Dataflow

1463 views   0 comments last modified about 7 months ago

This page documents the detailed steps to load CSV file from GCS into BigQuery using Dataflow to demo a simple data flow creation using Dataflow Tools for Eclipse. However it doesn’t necessarily mean this is the right use case for DataFlow. Alternatively ...

View detail
azure power-bi

Advanced analytics on big data with Azure - Tutorial

418 views   0 comments last modified about 7 months ago

Microsoft Azure provides a number of data analytics related products and services. It allows users to tailor the solutions to meet different requirements, for example, architecture for modern data warehouse, advanced analytics with big data or real time analytics. The following diagram sho...

View detail
power-bi bigquery

Use Google Cloud BigQuery as Data Source in Power BI

912 views   0 comments last modified about 8 months ago

BigQuery is Google’s serverless data warehouse in Google Cloud. Power BI can consume data from various sources including RDBMS, NoSQL, Could, Services, etc. It is also easy to get data from BigQuery in Power BI. In this article, I am going to demonstrate how to connect to BigQuery to create...

View detail
dotnet core lite-log

Set AttachDbFilename as Relative Path in .NET Core

749 views   0 comments last modified about 8 months ago

.NET Framework, you can use |DataDirectory| to configure connection string when connecting to SQL Server database file via attach mode: AttachDbFilename=|DataDirectory|\dbname.mdf In .NET Core, you cannot directly set SQL Server Express connec...

View detail
dotnet core lite-log

Instantiate a Service in ConfigureServices Method in .NET Core

74 views   0 comments last modified about 8 months ago

.NET Core is built in with dependency injection. Usually method ConfigureServices in Startup class is used to register services in the container. The signature of the method looks like the following: public void ConfigureServices(IServiceC...

View detail
kontext docu

Kontext Project is now Open Source as Docu Project

185 views   0 comments last modified about 8 months ago

Kontext project is now open source as Docu project hosted in GitHub. At the moment, only SQLite is supported and other databases will be added once version 1.0.0 is ready. SQLite is used for easy setup and testing. Prerequisites .NET Core SDK 2.1.3 .NET Core Runtime 2.1 ...

View detail
.net core lite-log

ASP.NET Core 2.1 Error - 'Cyrillic' is not a supported encoding name

578 views   0 comments last modified about 8 months ago

After upgrading to ASP.NET Core 2.1 (.NET Core SDK 2.1.301), you may encounter the following error about encoding: System.ArgumentException    HResult=0x80070057    Message='Cyrillic' is not a supported encoding name. For information on defining a custo...

View detail
.net core entity-framework

SQLite in .NET Core with Entity Framework Core

993 views   0 comments last modified about 8 months ago

SQLite is a self-contained and embedded SQL database engine. In .NET Core, Entity Framework Core provides APIs to work with SQLite. This page provides sample code to create a SQLite database using package Microsoft.EntityFrameworkCore.Sqlite . Create sample project ...

View detail core 2 .net core

Graphics Programming and Image Processing in .NET Core 2.x

454 views   0 comments last modified about 9 months ago

In .NET Core 2.x, Windows Forms or WPF are not implemented since they are based on GDI+ and DirectX respectively in Windows. In .NET Core 3.0, there is plan to add Desktop Packs which includes UWP. WPF and Windows Forms. However, they will still be Windows-only. In .NET Core applications, you may...

View detail
lite-log power-bi

Data Analysis Expressions to Create Static Tables in PowerBI

132 views   0 comments last modified about 9 months ago

DATATABLE StaticTable1 = DATATABLE("IntCol",INTEGER,"StringCol",STRING,{{1,"User1"},{2,"User2"}}) The above expression generates a table with two columns IntCol and StringCol : ...

View detail
power-bi google-analytics

Power Analytics with Power BI and Google Analytics

417 views   0 comments last modified about 9 months ago

Power BI is my favourite BI and visualization tool as it is very simple yet powerful. It doesn’t only support traditional data sources like databases, CSV, JSON, XML and etc., but also supports emerging sources that are available in HDFS, Spark, R, Salesforce, Google Analytics and cloud platforms...

View detail

Power BI Analytics - Connect to DBMS and Card

134 views   0 comments last modified about 9 months ago

Power BI supports connecting to most of the DBMS databases such as SQL Server, Oracle, Teradata, MySQL, DB2, Sybase, Snowflake, Google BigQuery, Impala and etc. This page summarizes the steps to connect to SQL Azure and to create the following part of the sample dashboard of this series: ...

View detail

Querying Teradata and SQL Server - Tutorial 1: The SELECT Statement

32962 views   7 comments last modified about 4 years ago

SELECT is one of the most commonly used statements. In this tutorial, I will cover the following items: Two of the principal query clauses—FROM and SELECT Data Types Built-in functions CASE expressions and variations like ISNULL and COALESCE. * The functio...

View detail

Install Teradata Express by Using VMware Player 6.0 in Windows

13656 views   23 comments last modified about 5 years ago

In this article, I am going to introduce how to install Teradata Express in virtual machines in Windows. Download software 1) Download VMware Player for Windows 32-bit and 64-bit from the following link (version 6.0): ...

View detail

Working with SQL Server Compact 4.0 using Entity Framework 6 and ADO.NET

11753 views   0 comments last modified about 5 years ago

SQL Server Compact 4.0 (CE 4.0) is a free SQL Server embedded database ideal for building standalone and occasionally connected applications for mobile devices, desktops, Web clients and others. In one of my projects, I used it as the database for logging errors, which assumes the errors will onl...

View detail
hadoop yarn hdfs

Install Hadoop 3.0.0 in Windows (Single Node)

11136 views   14 comments last modified about 13 months ago

This page summarizes the steps to install Hadoop 3.0.0 in your Windows environment. Reference page: ...

View detail

Create ETL Project with Teradata through SSIS

10093 views   2 comments last modified about 4 years ago

Infosphere DataStage is adopted as ETL (Extract, Transform, Load) tool in many Teradata based data warehousing projects. With the Teradata ODBC and .NET data providers, you can also use the BI tools from Microsoft, i.e. SSIS. In my previous post, I demonstrated how to install Teradata Tool...

View detail

Generate Formatted Excel Destination (Output) in SSIS Data Flow Task

10047 views   0 comments last modified about 5 years ago

SSIS (SQL Server Integration Service) provides a number of convenient tasks to enable data integration. Exporting data from database to Excel file is a common task in ETL (Extract, Transform, Load) projects. Constantly the users/customers may raise format request regarding the Excel extract. To g...

View detail core 2

Server.MapPath Equivalent in ASP.NET Core 2

9306 views   0 comments last modified about 2 years ago

In traditional applications, Server.MapPath is commonly used to generate absolute path in the web server. However, this has been removed from ASP.NET Core. So what is the equivalent way of doing it?

View detail
dotnet core angular core 2

Issue - Unable to get property 'apply' of undefined or null reference occurred in Angular 4.*, VS2017 15.3, ASP.NET Core 2.0

7636 views   10 comments last modified about 2 years ago

Issue Context After installed Visual Studio 2017 15.3 preview and .net core 2.0 preview SDK, I upgraded one of my existing core project to 2.0. The project was created using ‘dotnet new angular’ SPA template.  I also upgraded all the client app packages to the latest. For exa...

View detail
java kerberos

Java Kerberos Authentication Configuration Sample & SQL Server Connection Practice

6891 views   2 comments last modified about 3 years ago

Overview Recently, I have been working on an ETL framework to load various source data (i.e. files, SQL Server, Oracle and Teradata) into Teradata. Due to some limitations, Java was chosen as the implementation language though IBM Infosphere DataStage is available to use. DataStage has p...

View detail

Connect to Teradata Virtual Machine Guest from Windows Host

6665 views   16 comments last modified about 4 years ago

In my previous posts about Querying Teradata and SQL Server, I logged into the virtual machine graphic interface to manage the database. However, I constantly found it is resource intensive as there is only 4GB memory in my laptop. Instead, I will use text mode to start the virtual machine and co...

View detail
spark scala parquet

Write and Read Parquet Files in Spark/Scala

6407 views   2 comments last modified about 12 months ago

In this page, I’m going to demonstrate how to write and read parquet files in Spark/Scala by using Spark SQLContext class. Reference What is parquet format? Go the following project site to understand more about parquet. ...

View detail core identity core 2

Retrieve Identity username, email and other information in ASP.NET Core

5705 views   0 comments last modified about 2 years ago

The identity system in ASP.NET has evolved over time. If you are using ASP.NET Core, you probably found User property is an instance of ClaimsPrincipal in Controller or Razor views. Thus to retrieve the information, you need to utilize the claims.

View detail

[C#] Connect to Teradata Database via .NET Data Provider

5111 views   2 comments last modified about 4 years ago

In this post, I will demonstrate how to connect to Teradata database via .NET Data Provider for Teradata using C#. Prerequisites Install the .NET Data Provider for Teradata from the following link: ...

View detail

Resolve the Issues in Upgrading Entity Framework to Version 6.1

4456 views   0 comments last modified about 5 years ago

When upgrading your Entity Framework to Entity Framework 6.1 (EF6) from version 5.0, you may meet a number of issues. I have summarized all the issues I’ve encountered and their resolutions for your reference. Upgrade to EF6 Microsoft has provided one summary about upgrading to E...

View detail

Create and Debug C/C++ Programs with Eclipse and Cygwin in Windows

4385 views   0 comments last modified about 3 years ago

In this post, I am going to demonstrate how to use Eclipse to create and debug C/C++ programs for Unix/Linux in Windows. I am going to use Cygwin GCC as toolchains. Cygwin GDB will also be installed for debugging purpose. I am using Windows 10 and JRE 1.8 in the following steps. Install E...

View detail
teradata python

Connect to Teradata database through Python

4191 views   0 comments last modified about 2 years ago

Teradata published an official Python module which can be used in DevOps projects. More details can be found at the following GitHub site: Install Teradata module ...

View detail

about 9 months ago

I can get it work by using the following approach:

1) Create an IIS website (http://localhost/Test/) with one page index.html:

        <h1>Test iframe</h1>
        <iframe src="http://localhost:8080/#/notebook/2D7J63CN7" width="600px" height="400px" style="border:1px solid #000000"></iframe>

2) And then open the website in the browser: http://localhost/Test/

If you open with a file URL, then the content cannot be displayed due to security reasons:


When you publish your website, your Zeppelin site should also be deployed into a server that your user can access.

about 9 months ago

Hi Raymond Tang. I come back because I tried but I didn't succeded to embed a zeppelin notebook as an iframe in my website. I have something like that  

<div id="interactivForm">

                                <iframe id="MyInterpreter" src="http://localhost:8085/#/notebook/2DHDGTVNU"></iframe>
                           </div> in my website but it doesn't work. But I can access to http://localhost:8085/#/notebook/2DHDGTVNU without problem. Do you know how to do that?
Thank you.

about 9 months ago

You can use <iframe> html element to embed Zeppelin into your website.

This also means that your Zeppelin website (*:8080 by default) needs to be exposed to all your users (i.e. their networks).

about 9 months ago

Hi Raymond. Now I don't really want to do any authentification. I want only to give an opportunity to anonymous user to execute spark in my website using zeppelin. So do you know how to do that or do you have a tutorial where I can see how to do that step by step? Thank you very much.

about 9 months ago

You can embed Zeppelin into a website.

However you need to decide how to pass through user credentials from your website to Zeppelin website depends on the authentication type. 

The authentication part you can reference to the following website: 

Based on my current understanding, I don't think currently you can directly implement the automatic logon without extending Zeppelin (but I might be wrong). 

about 9 months ago

Hi everyone,

do you know if it is possible to embed Zeppelin Notebook in a webpage? Like as an iframe or another method? So that users can come and execute their own code? Do you think that is possible? Has someone an idea to do that? Thank you!

about 9 months ago

Nw, I'm glad it worked. :)

about 9 months ago

I am sorry. It works!

about 9 months ago

I come again and sorry. When I run this commande %HADOOP_HOME%\bin\hdfs dfs -put file:///G:/DataAnalytics/test.txt / I get an error: put: `G:/DataAnalytics/test.txt': No such file or directory. But I followed the configuration step by step and my DataAnalytics folder is in G:.

about 9 months ago

Hi. thank you for your answer. I found a solution for my problem. If it can help someone, the problem was related to the syntax of my system username. It contains a space. So, to fixe it, you can edit /etc/hadoop/hadoop-env.cmd, at the end of this file, you will find set HADOOP_IDENT_STRING=%USERNAME% , change this with a string that you want but without space. For example: set HADOOP_IDENT_STRING=myuser, the problem will be fixed.

about 9 months ago

Did you follow all the steps in this post? For example, you need to ensure winutils tool is installed: 

Overwrite your bin folder (%HADOOP_HOME%\bin) with the files from this link:

The current available version is 3.0.0 and I am not very sure whether it can fix the issue for 3.0.1 but worth giving it a try.

about 9 months ago

Good morning, I am trying to install hadoop 3.0.1 in my windows but when I want to test my configuration it gives me that error: Error: Can not find or load the main class. Can someone help me please?

Thank you

about 9 months ago


Yes, you can.

For example, the following code is used to read parquet files from a Hadoop cluster.

def readParquet(sqlContext: SQLContext) = {
// read back parquet to DF
val newDataDF ="hdfs://hdp-master:19000/user/hadoop/sqoop_test/blogs")
// show contents

The cluster was setup by following this post:

Configure Hadoop 3.1.0 in a Multi Node Cluster

Of source the hdp-master:19000 needs to be accessible from the server that running the Spark/Scala code.

At the moment, my HDFS is set as readable for all servers/users in the LAN. In a production environment, you may need to manage the permissions too.

Furthermore, you can also run Spark apps in a Spark Cluster instead of in stand-alone or local machine.  I will cover more about this in my future post.

about 10 months ago

Can we connect and read remotely located HDFS Parquet file? by using above code


  • enquiry[at]