PySpark - count() Function

visibility 19 access_time 2mo languageEnglish

count() is an aggregate function used to get the count of rows from the given column in the PySpark DataFrame.

We have to import count() method from pyspark.sql.functions

Syntax:

dataframe.select(count("column_name"))

Example:

  • Get count of rows  in marks and rollno column of the PySpark DataFrame
# import the below modules

import pyspark
from pyspark.sql import SparkSession
# create an app
spark = SparkSession.builder.appName('kontext').getOrCreate()
#create a list of data
values = [{'rollno': 1, 'student name': 'Gottumukkala Sravan kumar','marks': 98},
        {'rollno': 2, 'student name': 'Gottumukkala Bobby','marks': 89},
        {'rollno': 3, 'student name': 'Lavu Ojaswi','marks': 90},
        {'rollno': 4, 'student name': 'Lavu Gnanesh','marks': 78},
        {'rollno': 5, 'student name': 'Chennupati Rohith','marks': 100}]
# create the dataframe from the values
data = spark.createDataFrame(values)
#import count function
from pyspark.sql.functions import count
#display number of count in marks column
print(data.select(count("marks")).collect())
#display number of count in rollno column
print(data.select(count("rollno")).collect())

Output:

[Row(count(marks)=5)]

[Row(count(rollno)=5)]

copyright This page is subject to Site terms.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

timeline Stats
Page index 0.50
local_offer Tags