Analytics & BI

Data Analytics,Big Data,Data Storage and Business Intelligence.

Subscribe

lite-log hadoop

Hadoop on Windows - UNHEALTHY Data Nodes Fix

36   0   about 3 months ago

Solution to fix the issue If you have been running Hadoop on Windows machines, you may encounter issues about unhealthy data nodes. Usually this will happen if there is no enough disk space in your local drive. For example, if I start the HDFS and YARN demons under the context...

View detail
lite-log hadoop hdfs

Hadoop datanode issue and resolution - ‘Incompatible clusterIDs’

276   0   about 3 months ago

Issue After finishing installation Hadoop 3.0.0 in my Windows: Install Hadoop 3.0.0 in Windows (Single Node) , I got the following error after I formated the name node several ti...

View detail
sql server python spark pyspark

Connect to SQL Server in Spark (PySpark)

496   0   about 3 months ago

Spark is an analytics engine for big data processing. There are various ways to connect to a database in Spark. This page summarizes some of common approaches to connect to SQL Server using Python as programming language. ...

View detail
teradata python

Connect to Teradata database through Python

5,969   3   about 2 years ago

Teradata published an official Python module which can be used in DevOps projects. More details can be found at the following GitHub site: https://github.com/Teradata/PyTd Install Teradata module ...

View detail
python lite-log spark pyspark

Debug PySpark Code in Visual Studio Code

364   0   about 4 months ago

The page summarizes the steps required to run and debug PySpark (Spark for Python) in Visual Studio Code. Install Python and pip Install Python from the official website: https://...

View detail
python spark pyspark

Implement SCD Type 2 Full Merge via Spark Data Frames

1,404   0   about 5 months ago

Overview For SQL developers that are familiar with SCD and merge statements, you may wonder how to implement the same in big data platforms, considering database or storages in Hadoop are not designed/optimised for record level updates and inserts. In this post, I’m going to demons...

View detail
lite-log hadoop sqoop

Password Security Solution for Sqoop

65   0   about 6 months ago

In Sqoop, there are multiple approaches to pass in passwords for RDBMS. Options Option 1 - clear password through --password argument sqoop [subcommand] --username user --password pwd This is the weakest approach as password is exposed directly...

View detail
python spark

PySpark: Convert JSON String Column to Array of Object (StructType) in Data Frame

2,816   0   about 6 months ago

This post shows how to derive new column in a Spark data frame from a JSON array string column. I am running the code in Spark 2.2.1 though it is compatible with Spark 1.6.0 (with less JSON SQL functions). Prerequisites Refer to the following post to install Spark in Windows. ...

View detail
java bigquery gcp dataflow gcs

Load CSV File from Google Cloud Storage to BigQuery Using Dataflow

2,649   0   about 10 months ago

This page documents the detailed steps to load CSV file from GCS into BigQuery using Dataflow to demo a simple data flow creation using Dataflow Tools for Eclipse. However it doesn’t necessarily mean this is the right use case for DataFlow. Alternatively ...

View detail
azure power-bi

Advanced analytics on big data with Azure - Tutorial

694   0   about 11 months ago

Microsoft Azure provides a number of data analytics related products and services. It allows users to tailor the solutions to meet different requirements, for example, architecture for modern data warehouse, advanced analytics with big data or real time analytics. The following diagram sho...

View detail