PySpark DataFrame - Select Columns using select Function
Code description
In PySpark, we can use select
function to select a subset or all columns from a DataFrame.
Syntax
DataFrame.select(*cols)
This function returns a new DataFrame
object based on the projection expression list.
This code snippet prints out the following output:
+---+----------------+-------+---+ | id|customer_profile| name|age| +---+----------------+-------+---+ | 1| {Kontext, 3}|Kontext| 3| | 2| {Tech, 10}| Tech| 10| +---+----------------+-------+---+
Code snippet
from pyspark.sql import SparkSession from pyspark.sql.types import StructType, StructField, StringType, IntegerType appName = "PySpark Example - select" master = "local" # Create Spark session spark = SparkSession.builder \ .appName(appName) \ .master(master) \ .getOrCreate() spark.sparkContext.setLogLevel("WARN") data = [{"id": 1, "customer_profile": {"name": "Kontext", "age": 3}}, {"id": 2, "customer_profile": {"name": "Tech", "age": 10}}] customer_schema = StructType([ StructField('name', StringType(), True), StructField('age', IntegerType(), True), ]) df_schema = StructType([StructField("id", IntegerType(), True), StructField( "customer_profile", customer_schema, False)]) df = spark.createDataFrame(data, df_schema) print(df.schema) df.show() # select certain columns df.select('*', "customer_profile.name", "customer_profile.age").show()
copyright
This page is subject to Site terms.
comment Comments
No comments yet.