Improve your coding skills from beginner to expert with the largest online Java e-learning platform

Spark Module 4 Streaming and Structured Streaming

featuring Structured Streaming and Apache Kafka
  • Learn how to use Apache Spark for real-time streaming big data!
  • Both DStreams and Structured Streaming are covered
  • Use Apache Kafka to build a near-continuous realtime big data pipeline


We'll assume you're already familiar with Spark and SparkSQL - modules 1 and 2 in this series cover the basics

Contents - The course is over 3 hours long and with practical work should take a day or two to complete


Having problems? check the errata for this course.


Introduction and DStreams

55m 53s
DStreams is an older API but it is still in use, so we'll establish the basics of Streaming with this API. We'll use a simple socket server to simulate a stream of data.


Integrating with Apache Kafka

77m 52s
Apache Kafka is a highly performant, distributed event log and is perfect for use in streaming applications. Here we use it as a repository for holding a real time stream of events. We integrate with Spark Streaming using the Kafka module.


Structured Streaming

66m 45s
This newer API builds on the SparkSQL/DataFrame API and is a much more elegant system. Through this chapter we rebuild our previous work, and we discover how it can be used to build a streaming pipeline.

Let the Course Come to You

About Us Pricing Frequently Asked Questions Contact Privacy T&Cs Affiliates and Resellers
Facebook Twitter YouTube LinkedIn