Improve your coding skills from beginner to expert with the largest online Java e-learning platform

Spark Module 3 Machine Learning SparkML

Machine Learning for Big Data
  • Learn the basics of Machine Learning and how to apply to big data with SparkML
  • Supervised vs Unsupervised Learning
  • Linear Regressions
  • Logistic Regressions
  • Decision Trees
  • K-Means Clusters
  • Random Forests
  • Recommender Systems


We assume you're already familiar with Spark Core from modules 1 and 2.

Contents - The course will take on average 3 days to complete, including practical work


Having problems? check the errata for this course.



24 m 2 s
What is Machine Learning, Supervised vs Unsupervised Learning and the Model Building Process


Building a Linear Regression

30 m 40 s
Assembling vectors of features and Model Fitting


Training Data

26 m 33 s
Training vs Test and Holdout Data, Using data from Kaggle, RMSE and R2 tests


Model Fitting Parameters

25 m 41 s
Setting Linear Regression Parameters


Feature Selection

36 m 23 s
Correlation of features, Identifying duplicate features, data preparation


Non Numeric Data

25 m 48 s
Using OneHotEncoding and Vectors



19 m 42 s
How to build a pipeline in SparkML


Case Study

34 m 51 s
A full practical exercise


Logistic Regression

26 m 12 s
True and False Negatives and Postives, Coding a Logistic Regression Model


Decision Trees

46 m 21 s
Building a decicision tree model, Interpreting a tree and Random Forests


Unsupervised Learning: K-Means Clustering

10 m 49 s
K-Means Clustering and how to implement in SparkML


Recommender Systems

29 m 7 s
Matrix Factorisation and how to build a model in SparkML

Let the Course Come to You

About Us Pricing Frequently Asked Questions Contact Privacy T&Cs Affiliates and Resellers
Facebook Twitter YouTube LinkedIn