PySpark - Select columns by datatype in DataFrame

visibility 22 access_time 2mo languageEnglish

Steps:

  1. Install PySpark module
  2. Create a DataFrame with schema fields
  3. Get the column types using different data types
  4. Display the data
pip install pyspark

Code:

import pyspark

from pyspark.sql import SparkSession

from pyspark.sql.types import StringType, DoubleType,IntegerType,StructType, StructField,FloatType


spark = SparkSession.builder.appName('kontexttech').getOrCreate()


values = [(1, "Gottumukkala Sravan Kumar",4500.00), (2, "Bobby",93445.000), (3, "Gnanesh",88900.000)]


schema = StructType([
StructField("rollno", IntegerType(), True),StructField("name", StringType(), True),StructField("fee", FloatType(), True),])


data = spark.createDataFrame(values, schema)


print(data [[i.name for i in data.schema.fields if isinstance(i.dataType, IntegerType)]].collect())


print(data [[i.name for i in data.schema.fields if isinstance(i.dataType, StringType)]].collect())


print(data [[i.name for i in data.schema.fields if isinstance(i.dataType,FloatType)]].collect())

Output:

[Row(rollno=1), Row(rollno=2), Row(rollno=3)]

[Row(name='Gottumukkala Sravan Kumar'), Row(name='Bobby'), Row(name='Gnanesh')]

[Row(fee=4500.0), Row(fee=93445.0), Row(fee=88900.0)]

info Last modified by Gottumukkala Sravan Kumar 2mo copyright This page is subject to Site terms.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

timeline Stats
Page index 2.73
local_offer Tags