PySpark - stddev() Function

visibility 16 access_time 2mo languageEnglish

stddev() is an aggregate function used to get the standard deviation from the given column in the PySpark DataFrame.

We have to import stddev() method from pyspark.sql.functions

Syntax:

dataframe.select(stddev("column_name"))

Example:

  • Get standard deviation  in marks column of the PySpark DataFrame
# import the below modules

import pyspark
from pyspark.sql import SparkSession
# create an app
spark = SparkSession.builder.appName('kontext').getOrCreate()
#create a list of data
values = [{'rollno': 1, 'student name': 'Gottumukkala Sravan kumar','marks': 98},
        {'rollno': 2, 'student name': 'Gottumukkala Bobby','marks': 89},
        {'rollno': 3, 'student name': 'Lavu Ojaswi','marks': 90},
        {'rollno': 4, 'student name': 'Lavu Gnanesh','marks': 78},
        {'rollno': 5, 'student name': 'Chennupati Rohith','marks': 100}]
# create the dataframe from the values
data = spark.createDataFrame(values)
#import stddev function
from pyspark.sql.functions import stddev
#display standard deviation of marks
print(data.select(stddev("marks")).collect())

Output:

[Row(stddev_samp(marks)=8.717797887081348)]

copyright This page is subject to Site terms.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

timeline Stats
Page index 0.42
local_offer Tags