python

97 items tagged with "python"

95 Articles
2 Diagrams

Articles

FastAPI Streaming Response: Error: Did not receive done or success response in stream

2024-07-03
Python Programming

Geoanalytics with geopandas and ABS Public Data in Python

2024-04-06
Python Programming

GenAI: Generate SQL from Text via Ollama Python Library on Local

2024-03-24
Python Programming

Danger: Bypass SSL Interception Proxy Server Certificate Validation in .NET or Python HTTP Requests

2024-01-05
.NET Programming

Pandas DataFrame aggregate a list column to a set

This code snippet shows you how to group a pandas DataFrame and then aggregate a column with list or array type to a set (with duplicates removed) or a list. To implement it, we can first expode the column (list type) and then use groupby to create a grouped DataFrame and then aggregate using set or list or a combination of both. Input `` category users 0 A [1, 2] 1 B [3, 4] 2 C [5, 6, 7] 3 A [1, 8, 1] 4 B [1, 6, 9] ` Output ` category usersset userslist 0 A {8, 1, 2} [8, 1, 2] 1 B {1, 3, 4, 6, 9} [1, 3, 4, 6, 9] 2 C {5, 6, 7} [5, 6, 7] ``

2023-08-10
Python Programming

Union two pandas DataFrame

This code snippet shows you how to union two pandas DataFrames in python using concat method in pandas namespace.If the schema is different, pandas will autmatically merge it. Output `` category value user 0 A 0 2.0 1 B 1 3.0 2 C 2 2.0 3 D 3 1.0 4 E 4 1.0 0 A 0 NaN 1 B 1 NaN 2 C 2 NaN 3 D 3 NaN 4 E 4 NaN ` For the second DataFrame, column user doesn't exist. Pandas uses NaN` to mark it.

2023-08-09
Code Snippets & Tips

Pandas DataFrame groupBy and then COUNT DISTINCT

This code snippet shows you how to group a pandas DataFrame via certain column and then do a distinct count of unique values in another column. It is similar as COUNT DISTINCT aggregation functions in SQL. It also sort the values by the group by column. Example output: `` category value user group-count 0 A 0 5 7 80 A 80 4 7 70 A 70 7 7 60 A 60 10 7 50 A 50 9 7 .. ... ... ... ... 29 J 29 9 7 19 J 19 9 7 9 J 9 9 7 89 J 89 8 7 99 J 99 7 7 [100 rows x 4 columns] ``

2023-08-09
Code Snippets & Tips

PySpark split and explode example

This code snippet shows you how to define a function to split a string column to an array of strings using Python built-in split function. It then explodes the array element from the split into using PySpark built-in explode function. Sample output `` +----------+-----------------+--------------------+-----+ | category| users| users_array| user| +----------+-----------------+--------------------+-----+ |Category A|user1,user2,user3|[user1, user2, us...|user1| |Category A|user1,user2,user3|[user1, user2, us...|user2| |Category A|user1,user2,user3|[user1, user2, us...|user3| |Category B| user3,user4| [user3, user4]|user3| |Category B| user3,user4| [user3, user4]|user4| +----------+-----------------+--------------------+-----+ ``

2023-08-06
Spark & PySpark

Azure App Service IIS Log Analytics using Pandas

2022-09-05
Python Programming

Flatten Pandas DataFrame after Aggregate

In code snippet Pandas DataFrame Group by one Column and Aggregate using MAX, MIN, MEAN and MEDIAN, it shows how to do aggregations in a pandas DataFrame. This code snippet shows you how to flatten the DataFrame (multiindex) after aggregations. Sample output: category value\max value\min value\mean value\median 0 A 90 0 45 45 1 B 91 1 46 46 2 C 92 2 47 47 3 D 93 3 48 48 4 E 94 4 49 49 5 F 95 5 50 50 6 G 96 6 51 51 7 H 97 7 52 52 8 I 98 8 53 53 9 J 99 9 54 54

2022-09-04
Code Snippets & Tips

Pandas DataFrame Group by one Column and Aggregate using MAX, MIN, MEAN and MEDIAN

This code snippet provides one example of grouping a pandas DataFrame by one column and then aggregating on multiple columns using different functions including max, min, mean and median. We pass in a dictionary for each column that needs to be aggregated: the key is the column name, and the value is a list of aggregation functions supported by pandas DataFrame. The result DataFrame will have multiple levels as following output shows: Sample output: value max min mean median category A 90 0 45 45 B 91 1 46 46 C 92 2 47 47 D 93 3 48 48 E 94 4 49 49 F 95 5 50 50 G 96 6 51 51 H 97 7 52 52 I 98 8 53 53 J 99 9 54 54

2022-09-04
Code Snippets & Tips

Get First Top N Rows in a Pandas DataFrame

Method pandas.DataFrame.head can be used to retrieve top N records from a DataFrame object. It has one optional parameter for the number of rows to return; the default value is 5 if not specified. Syntax `` DataFrame.head(n=5) `` Sample output: A B 0 0.000000 0.000000 1 1.010101 10.101010 2 2.020202 20.202020 3 3.030303 30.303030 4 4.040404 40.404040 5 5.050505 50.505051 6 6.060606 60.606061 7 7.070707 70.707071 8 8.080808 80.808081 9 9.090909 90.909091

2022-09-04
Code Snippets & Tips

Check Column Data Types in a Pandas DataFrame

This code snippet provide examples of checking column data types in a DataFrame using dtype and dtypes. Sample output: >>> print(df.dtypes) a int64 b object dtype: object >>> print(df.a.dtype) int64

2022-09-04
Code Snippets & Tips

Rename Columns in Pandas DataFrame

This code snippet provides a simple approach to rename column names in a Panda DataFrame. It directly sets DataFrame.columns property. Output: !2022090404325-image.png

2022-09-04
Code Snippets & Tips

Python - Get a List of Files in a Directory

This code snippet provides an example of listing files in a directory in the file system using Python. Module os is used. There are multiple approaches to implement this: os.listdir - returns both files and directory in a directory. We can check whether the returned items are files using os.path.isfile method. os.walk - generates two lists for each directory it traverse - files and directories. The walk can be used to retrieve all the sub folders recursively. Sample output: ['F:\\Projects\\kontext-logs\\RawLogs-0\\c716eb-202209020059.log', 'F:\\Projects\\kontext-logs\\RawLogs-0\\c716eb-202209020429.log', 'F:\\Projects\\kontext-logs\\RawLogs-0\\c716eb-202209020710.log', 'F:\\Projects\\kontext-logs\\RawLogs-0\\c716eb-202209020921.log', 'F:\\Projects\\kontext-logs\\RawLogs-0\\c716eb-202209021147.log', 'F:\\Projects\\kontext-logs\\RawLogs-0\\c716eb-202209021357.log', 'F:\\Projects\\kontext-logs\\RawLogs-0\\c716eb-202209021637.log', 'F:\\Projects\\kontext-logs\\RawLogs-0\\c716eb-202209022007.log', 'F:\\Projects\\kontext-logs\\RawLogs-0\\c716eb-202209030218.log', 'F:\\Projects\\kontext-logs\\RawLogs-0\\c716eb-202209030818.log'] Reference: os — Miscellaneous operating system interfaces

2022-09-04
Code Snippets & Tips

Python Convert Relative to Absolute File Path

This code snippet shows you how to convert a relative path to absolute path using os module in Python and vice versa. We utilize os.path.abspath and os.path.relpath functions. For the later, it accepts the second parameter as the base path which will be used to calculate the relative path. Output: \*The current directory when running this code is C:\Users\userid. C:\Users\test.csv ..\test.csv

2022-09-04
Code Snippets & Tips

Use sort() and orderBy() with PySpark DataFrame

In Spark DataFrame, two APIs are provided to sort the rows in a DataFrame based on the provided column or columns: sort and orderBy. orderBy is just the alias for sort API. Syntax `` DataFrame.sort(cols, *kwargs) ` For *cols, we can used it to specify a column name, a Column object (pyspark.sql.Column), or a list of column names or Column objects. For kwargs, we can use it to specify additional arguments. For PySpark, we can specify a parameter named ascending. By default the value is True. It can be a list of boolean values for each columns that are used to sort the records. The code snippet provides the examples of sorting a DataFrame. Sample outputs ` +---+----+ | id|col1| +---+----+ | 2| E| | 4| E| | 6| E| | 8| E| | 1| O| | 3| O| | 5| O| | 7| O| | 9| O| +---+----+ +---+----+ | id|col1| +---+----+ | 2| E| | 4| E| | 6| E| | 8| E| | 1| O| | 3| O| | 5| O| | 7| O| | 9| O| +---+----+ +---+----+ | id|col1| +---+----+ | 8| E| | 6| E| | 4| E| | 2| E| | 9| O| | 7| O| | 5| O| | 3| O| | 1| O| +---+----+ ``

2022-09-03
Code Snippets & Tips

PySpark DataFrame - Add or Subtract Milliseconds from Timestamp Column

This code snippets shows you how to add or subtract milliseconds (or microseconds) and seconds from a timestamp column in Spark DataFrame. It first creates a DataFrame in memory and then add and subtract milliseconds/seconds from the timestamp column ts using Spark SQL internals. Output: `` +---+--------------------------+--------------------------+--------------------------+--------------------------+ |id |ts |ts1 |ts2 |ts3 | +---+--------------------------+--------------------------+--------------------------+--------------------------+ |1 |2022-09-01 12:05:37.227916|2022-09-01 12:05:37.226916|2022-09-01 12:05:37.228916|2022-09-01 12:05:38.227916| |2 |2022-09-01 12:05:37.227916|2022-09-01 12:05:37.226916|2022-09-01 12:05:37.228916|2022-09-01 12:05:38.227916| |3 |2022-09-01 12:05:37.227916|2022-09-01 12:05:37.226916|2022-09-01 12:05:37.228916|2022-09-01 12:05:38.227916| |4 |2022-09-01 12:05:37.227916|2022-09-01 12:05:37.226916|2022-09-01 12:05:37.228916|2022-09-01 12:05:38.227916| +---+--------------------------+--------------------------+--------------------------+--------------------------+ ` \*Note - the code assuming SparkSession object already exists via variable name spark`.

2022-09-01
Code Snippets & Tips

Subscribe to Multiple Topics in Kafka Consumer

When fetching records from Kafka bootstrap servers, we can specify one consumer to subscribe to more than one topic. This can be done in the init function by passing a list of topics: `` topics = ['kontext-events', 'other-events'] bootstrap_servers = 'localhost:9092' consumer = KafkaConsumer( *topics, client_id='local-test', bootstrapservers=bootstrapservers, autooffsetreset='earliest' ) ` Alternatively, we can use subscribe() function to subscribe to multiple topics. We can also regular expressions to match topics via patterns: ` subscribe(topics=(), pattern=None, listener=None) ``

2022-08-23
Code Snippets & Tips

Kafka Consumer - Fetch Records from Beginning

This code snippet reads data from topic's beginning offset by utilizing seektobeginning() API. This will sets each partitions subscribed by the consumers to fetch records from the earliest available offsets. Example output: polling... Kontext message 1 Kontext message 2 Kontext message 3 Kontext message 4 Kontext message 5 New kontext events~!! New events! polling... polling...

2022-08-22
Code Snippets & Tips

Retrieve Kafka Messages (Records) via Timestamp

2022-08-22
Streaming Analytics & Kafka

Convert Timestamp to Milliseconds since Epoch in Python

This code snippets provides different approaches to convert a Python timestamp object to an integer that denotes the total milliseconds since Epoch (midnight 1st Jan 1970 UTC, UNIX time). Output: `` 1661169251815.902 1661133251815.902 ` \*Your result can be different depends on the time when you run the code. The result is the same no matter if we use utcnow() or now()` as they both represent the same time. However, if your input timestamp format is string, you need to be careful about time zone information when converting them to timestamp, i.e. specify the right time zone accordingly.

2022-08-22
Code Snippets & Tips

Python - Read Messages in Kafka Topic

This code snippet provides example of reading messages from Kafka topics. In common practices, one Kafka consumer is used to read from one partition for a topic; if there are multiple partitions for the same topic, each consumer can run in different servers or containers. Those consumers for different partitions are also configured within one consumer group. In this example, all partitions are assigned to the same consumer. This code snippet utilize Python package kafka-python. It can be installed via the following command if you have pip installed: `` pip install kafka-python ` Or we can also use built-in pip in Python 3: ` python -m pip install kafka-python `` References Apache Kafka Python Producer and Consumer Clients Introduction Install and Run Kafka 3.2.0 On WSL Install and Run Kafka 2.6.0 On Windows 10

2022-08-22
Code Snippets & Tips

Remove Special Characters from Column in PySpark DataFrame

Spark SQL function regexreplace can be used to remove special characters from a string column in Spark DataFrame. Depends on the definition of special characters, the regular expressions can vary. For instance, [^0-9a-zA-Z\-]+ can be used to match characters that are not alphanumeric or are not hyphen(-) or underscore(\_); regular expression '[@\+\#\$\%\^\!]+' can match these defined special characters. This code snippet replace special characters with empty string. Output: `` +---+--------------------------+ |id |str | +---+--------------------------+ |1 |ABCDEDF!@#$%%^123456qwerty| |2 |ABCDE!!! | +---+--------------------------+ +---+-------------------+ | id| replaced_str| +---+-------------------+ | 1|ABCDEDF123456qwerty| | 2| ABCDE| +---+-------------------+ ``

2022-08-19
Code Snippets & Tips

PySpark DataFrame - Select Columns using select Function

In PySpark, we can use select function to select a subset or all columns from a DataFrame. Syntax `` DataFrame.select(*cols) ` This function returns a new DataFrame object based on the projection expression list. This code snippet prints out the following output: ` +---+----------------+-------+---+ | id|customer_profile| name|age| +---+----------------+-------+---+ | 1| {Kontext, 3}|Kontext| 3| | 2| {Tech, 10}| Tech| 10| +---+----------------+-------+---+ ``

2022-08-11
Code Snippets & Tips

Extract Value from XML Column in PySpark DataFrame

2022-07-15
Spark & PySpark

Python Format with Dictionary Object

2022-07-15
Code Snippets & Tips

Access the Index in Python 'for' Loops

2022-07-15
Code Snippets & Tips

Python: Read Data from MS Access Database via ODBC

2022-07-09
Python Programming

PySpark - Read and Parse Apache Access Log Text Files

2022-07-09
Spark & PySpark

Pandas DataFrame - Get Row and Column Count

2022-07-07
Code Snippets & Tips

Install Miniconda and Anaconda on WSL 2 or Linux

2022-06-19
Tools & Systems

PySpark - Read Data from MariaDB Database

2022-06-18
Spark & PySpark

PySpark - Read Data from Oracle Database

2022-06-18
Spark & PySpark

Pandas DataFrame - Iterate over Rows

2022-06-14
Code Snippets & Tips

Python: Read Data from Oracle Database

2022-06-05
Python Programming

Connect to HBase in Python via HappyBase

2022-03-22
Hadoop, Hive & HBase

Install Python 3.9.1 on WSL

2022-01-08
Tools & Systems

Install Airflow on Windows via Windows Subsystem for Linux (WSL)

2022-01-08
The Data Engineering

Torchvision error: Could not find module image.pyd

2021-12-30
Python Programming

Create Environments with Different Python Versions in Anaconda

2021-12-30
Python Programming

Call SQL Server Stored Procedure in Python

2021-12-23
Python Programming

Read Email from Microsoft 365 via Microsoft Graph API

In article Python: Send Email via Microsoft Graph API, I provided detailed steps to send email through msal package. In this article, I am going to show you how to read emails from Microsoft 365 via Microsoft Graph API.

2021-10-18
Python Programming

Spark - 保存DataFrame为Hive数据表

2021-10-13
Spark 中文

Spark (PySpark) - 从SQL Server数据库中读取数据

2021-10-13
Spark 中文

PySpark: 将DataFrame中的JSON字符列转换为数组

2021-10-13
Spark 中文

PySpark - 转换Python数组或串列为Spark DataFrame

2021-10-13
Spark 中文

Python: Send Email via Microsoft Graph API

2021-09-11
Python Programming

Set Spark Python Versions via PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON

2021-09-05
Spark & PySpark

Load XML File into BigQuery

2021-05-16
Google Cloud Platform

Spark - Read from BigQuery Table

2021-03-21
Google Cloud Platform

Python: Read Data from BigQuery

2021-03-17
Google Cloud Platform

Pandas - Save DataFrame to BigQuery

2021-03-17
Google Cloud Platform

Load JSON File into BigQuery

2021-03-14
Google Cloud Platform

Connect to PostgreSQL in Spark (PySpark)

2021-02-14
Spark & PySpark

Python: Load Data from MySQL

2021-01-23
Python Programming

Python: Load Data from Hive

2021-01-06
Hadoop, Hive & HBase

Load Microsoft 365 SharePoint List Data in Python

2020-09-14
Python Programming

Read and Write XML Files with Python

2020-09-08
Python Programming

Kafka Topic Partitions Walkthrough via Python

2020-09-06
Streaming Analytics & Kafka

Apache Kafka Python Producer and Consumer Clients Introduction

2020-09-05
Streaming Analytics & Kafka

Python: Save Pandas DataFrame to Teradata

2020-05-03
Spark & PySpark

Python: Load / Read Multiline CSV File

2020-05-02
Python Programming

Create, Insert, Delete, Update Operations on Teradata via JDBC in Python

2020-04-22
Python Programming

Python: Read Data from SQLite via JDBC

2020-04-10
Python Programming

Data Operations with SQLite Database via Python

2020-04-10
Python Programming

AttributeError: type object 'java.sql.Types' has no attribute '__javaclass__'

2020-04-10
Python Programming

Pandas DataFrame Plot - Scatter and Hexbin Chart

2020-04-04
Code Snippets & Tips

Pandas DataFrame Plot - Area Chart

2020-04-04
Code Snippets & Tips

Pandas DataFrame Plot - Pie Chart

2020-04-04
Code Snippets & Tips

Convert String to Date using Python

2020-04-04
Code Snippets & Tips

Pandas DataFrame Plot - Line Chart

2020-04-04
Code Snippets & Tips

Pandas DataFrame Plot - Bar Chart

2020-04-04
Code Snippets & Tips

PySpark Read Multiline (Multiple Lines) from CSV File

2020-03-31
Spark & PySpark

Connect to SQL Server via JayDeBeApi in Python

Python JayDeBeApi module allows you to connect from Python to databases using Java JDBC drivers.

2020-02-29
Code Snippets & Tips

PySpark: Convert Python Dictionary List to Spark DataFrame

2019-12-31
Spark & PySpark

Create and Read Pickle Files in Python

2019-12-26
Code Snippets & Tips

Convert Python Dictionary List to PySpark DataFrame

2019-12-25
Spark & PySpark

Check installed packages in Python

Different programming languages have different package management tools.

2019-11-18
Code Snippets & Tips

Read and parse JSON in Python

2019-11-18
Code Snippets & Tips

Read JSON file as Spark DataFrame in Python / Spark

Spark has easy fluent APIs that can be used to read data from JSON file as DataFrame object.

2019-11-18
Code Snippets & Tips

Write and read parquet files in Python / Spark

Parquet is columnar store format published by Apache. It's commonly used in Hadoop ecosystem. There are many programming language APIs that have been implemented to support writing and reading parquet files.

2019-11-18
Code Snippets & Tips

Convert string to date in Python / Spark

This code snippet shows how to convert string to date.

2019-11-18
Code Snippets & Tips

Convert List to Spark Data Frame in Python / Spark

In Spark, SparkContext.parallelize function can be used to convert list of objects to RDD and then RDD can be converted to DataFrame object through SparkSession.

2019-11-18
Code Snippets & Tips

Convert PySpark Row List to Pandas Data Frame

2019-08-22
Spark & PySpark

PySpark: Convert Python Array/List to Spark Data Frame

2019-07-10
Spark & PySpark

Read Hadoop Credential in PySpark

2019-07-06
Spark & PySpark

Data Partition in Spark (PySpark) In-depth Walkthrough

2019-03-30
Spark & PySpark

PySpark - Fix PermissionError: [WinError 5] Access is denied

2019-03-27
Spark & PySpark

Spark - Save DataFrame to Hive Table

2019-03-27
Spark & PySpark

Connect to SQL Server in Spark (PySpark)

2019-03-23
Spark & PySpark

Debug PySpark Code in Visual Studio Code

2019-03-03
Spark & PySpark

Implement SCD Type 2 Full Merge via Spark Data Frames

2019-02-03
Spark & PySpark

PySpark: Convert JSON String Column to Array of Object (StructType) in Data Frame

2019-01-05
Spark & PySpark

Connect to Teradata database through Python

2018-01-10
Teradata

Diagrams