scala
22 items tagged with "scala"
Articles
Find Number of Rows of Hive Table via Scala
To find the number of rows/records in a Hive table, we can use Spark SQL count aggregation function: Hive SQL - Aggregate Functions Overview with Examples. This code snippet provide example of Scala code to implement the same. spark-shell is used directly for simplicity. The code snippet can also run Jupyter Notebooks or Zeppelin with Spark kernel. Alternatively, it can be compiled to jar file and then submit as job via spark-submit. !2022082315649-image.png
Spark submit --num-executors --executor-cores --executor-memory
Spark Dataset and DataFrame
Spark Scala: Load Data from MySQL
Spark Scala: Load Data from SQL Server
Spark Scala: Read XML File as DataFrame
Scala: Read JSON file as Spark DataFrame
Scala: Read CSV File as Spark DataFrame
Scala: Parse JSON String as Spark DataFrame
Scala: Change Column Type in Spark Data Frame
Scala: Filter Spark DataFrame Columns with None or Null Values
This article shows you how to filter NULL/None values from a Spark data frame using Scala. Function DataFrame.filter or DataFrame.where can be used to filter out null values.
Scala - Add Constant Column to Spark Data Frame
Scala: Remove Columns from Spark Data Frame
Scala: Change Data Frame Column Names in Spark
Scala: Convert List to Spark Data Frame
Write and read parquet files in Scala / Spark
Parquet is columnar store format published by Apache. It's commonly used in Hadoop ecosystem. There are many programming language APIs that have been implemented to support writing and reading parquet files.
Convert string to date in Scala / Spark
This code snippet shows how to convert string to date.
Read JSON file as Spark DataFrame in Scala / Spark
Spark has easy fluent APIs that can be used to read data from JSON file as DataFrame object.
Convert List to Spark Data Frame in Scala / Spark
In Spark, SparkContext.parallelize function can be used to convert list of objects to RDD and then RDD can be converted to DataFrame object through SparkSession.