SparkContext

This is the core of Spark. It handles:

Without SparkContext, Spark cannot run at all.

Example:

sc = SparkContext(appName="MyApp")
rdd = sc.parallelize([1, 2, 3])

SparkSession

Starting with Spark 2.0, SparkSession became the main interface.

SparkSession:

Example:

spark = SparkSession.builder.appName("MyApp").getOrCreate()
df = spark.read.json("data.json")

Accessing SparkContext from SparkSession:

sc = spark.sparkContext

Before Spark 2.0:

This was messy. SparkSession unified all of these:

✔ a single entry point ✔ unified configuration ✔ easier to use in Python/JVM ✔ consistent API

Feature	SparkContext	SparkSession
Entry point	Yes, low-level	Yes, unified and high-level
RDD API	✔	✔ via `spark.sparkContext`
DataFrame/SQL/Dataset	✘	✔
Read/write data (DataSource API)	Limited	✔
Hive support	✘	✔
Internally used by SparkSession	✘	✔

SparkSession is the modern, unified way to work with Spark. It is a high-level API that:

SparkContext is the engine. SparkSession is the steering wheel.