Skip to main content

Python Spark SQL 01

Spark SQL

Description

07-sql-dataframe-picture.png

Spark lets you use the programmatic API, the SQL API, or a combination of both. This flexibility makes Spark accessible to a variety of users and powerfully expressive.

Prerequisites

  • Python
  • Spark

Go to spark/my-examples and create:

python-spark-sql-01.py

Let’s persist the DataFrame in a named Parquet table that is easily accessible via the SQL API. (df1.write.saveAsTable("some_people") - python-spark-sql-01.py)

print("Create a Spark DataFrame")
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("demo").getOrCreate()

from pyspark.sql.functions import col, when

df = spark.createDataFrame(
[
("sue", 32),
("li", 3),
("bob", 75),
("heo", 13),
],
["first_name", "age"],
)

df1 = df.withColumn(
"life_stage",
when(col("age") < 13, "child")
.when(col("age").between(13, 19), "teenager")
.otherwise("adult"),
)
df1.show()

print("Let’s persist the DataFrame")
df1.write.saveAsTable("some_people")
print("...\n...")

print("\n\nMake sure that the table is accessible via the table name 'select * from some_people':")
spark.sql("select * from some_people").show()

print("Now, let’s use SQL to insert a few more rows of data into the table:")
spark.sql("INSERT INTO some_people VALUES ('frank', 4, 'child')")
print("...\n...")

print("Inspect the table contents to confirm the row was inserted:")
spark.sql("select * from some_people").show()

print("Run a query that returns the teenagers:")
spark.sql("select * from some_people where life_stage='teenager'").show()

print("\n\nstop the spark session")
spark.stop()

Run

./bin/spark-submit --master local[4] ./my-examples/python-spark-sql-01.py

Output

Create a Spark DataFrame
+----------+---+----------+
|first_name|age|life_stage|
+----------+---+----------+
| sue| 32| adult|
| li| 3| child|
| bob| 75| adult|
| heo| 13| teenager|
+----------+---+----------+

Let’s persist the DataFrame
...
...


Make sure that the table is accessible via the table name 'select * from some_people':
+----------+---+----------+
|first_name|age|life_stage|
+----------+---+----------+
| heo| 13| teenager|
| bob| 75| adult|
| sue| 32| adult|
| li| 3| child|
+----------+---+----------+

Now, let’s use SQL to insert a few more rows of data into the table:
...
...
Inspect the table contents to confirm the row was inserted:
+----------+---+----------+
|first_name|age|life_stage|
+----------+---+----------+
| heo| 13| teenager|
| frank| 4| child|
| bob| 75| adult|
| sue| 32| adult|
| li| 3| child|
+----------+---+----------+

Run a query that returns the teenagers:
+----------+---+----------+
|first_name|age|life_stage|
+----------+---+----------+
| heo| 13| teenager|
+----------+---+----------+



stop the spark session