Improve your coding skills from beginner to expert with the largest online Java e-learning platform

Spark Module 4 Streaming and Structured Streaming

featuring Structured Streaming and Apache Kafka
  • Learn how to use Apache Spark for real-time streaming big data!
  • Both DStreams and Structured Streaming are covered
  • Use Apache Kafka to build a near-continuous realtime big data pipeline


We'll assume you're already familiar with Spark and SparkSQL - modules 1 and 2 in this series cover the basics

Contents - The course is over 3 hours long and with practical work should take a day or two to complete


Having problems? check the errata for this course.


Introduction and DStreams

55 m 53 s
DStreams is an older API but it is still in use, so we'll establish the basics of Streaming with this API. We'll use a simple socket server to simulate a stream of data.


Integrating with Apache Kafka

77 m 52 s
Apache Kafka is a highly performant, distributed event log and is perfect for use in streaming applications. Here we use it as a repository for holding a real time stream of events. We integrate with Spark Streaming using the Kafka module.


Structured Streaming

66 m 45 s
This newer API builds on the SparkSQL/DataFrame API and is a much more elegant system. Through this chapter we rebuild our previous work, and we discover how it can be used to build a streaming pipeline.

Let the Course Come to You

About Us Pricing Frequently Asked Questions Contact Privacy T&Cs Affiliates and Resellers
Facebook Twitter YouTube LinkedIn