Pandas DataFrame Plot - Scatter and Hexbin Chart

access_time 10 months ago visibility295 comment 0

 In this article I'm going to show you some examples about plotting scatter and hexbin chart with Pandas DataFrame. I'm using Jupyter Notebook as IDE/code execution environment. 

Hexbin chart is a pcolor of a 2-D histogram with hexagonal cell and can be more informative compared with Scatter chart.

Prepare the data

Use the following code snippet to create a Pandas DataFrame object in memory:

import pandas as pd
import numpy as np

data = []
n = 10000
x = np.random.standard_normal(n)
y = 2.0 + 3.0 * x + 4.0 * np.random.standard_normal(n)
df = pd.DataFrame()
df['x'] = x
df['y'] = y
df

The above code populates a dataframe like the following table:

xy
0-0.326429-2.236740
10.454832-2.747080
20.1327234.515384
3-0.4377083.494672
4-0.264059-3.256577
.........
99950.0686484.059994
99960.9932742.318345
9997-0.895868-4.447368
99980.4227946.256481
99990.4410447.309338

10000 rows × 2 columns

Scatter chart

plt =df.plot(kind='scatter',x='x', y='y')

The above code snippet plots the following chart:

Hexbin

plt =df.plot(kind='hexbin',x='x', bins='log',y='y')

For log function is used for creating bins. The chart looks like the following screenshot:

The deeper the color, the higher the density.

info Last modified by Administrator 6 months ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Follow Kontext

Get our latest updates on LinkedIn or Twitter.

Want to publish your article on Kontext?

Learn more

More from Kontext

visibility 837
thumb_up 0
access_time 10 months ago

In my previous article about  Convert string to date in Python / Spark , I showed how to use Spark udf to convert string to date in PySpark. Today I'm going to show you how to use pure Python function to convert string to date. datetime.datetime.strptime function is used to convert string to ...

visibility 5189
thumb_up 0
access_time 2 years ago

Parquet is columnar store format published by Apache. It's commonly used in Hadoop ecosystem. There are many programming language APIs that have been implemented to support writing and reading parquet files. 

Kafka Topic Partitions Walkthrough via Python
visibility 913
thumb_up 0
access_time 5 months ago

Partition is the parallelism unit in a Kafka cluster. Partitions are replicated in Kafka cluster (cluster of brokers) for fault tolerant and throughput. This articles show you how to work with Kafka partitions using Python as programming language. Package kafka-python will be used in the ...