Improve your coding skills from beginner to expert with the largest online Java e-learning platform

Spark Module 3 Machine Learning SparkML

Machine Learning for Big Data
  • Learn the basics of Machine Learning and how to apply to big data with SparkML
  • Supervised vs Unsupervised Learning
  • Linear Regressions
  • Logistic Regressions
  • Decision Trees
  • K-Means Clusters
  • Random Forests
  • Recommender Systems

Pre-requisites

We assume you're already familiar with Spark Core from modules 1 and 2.

Contents - The course will take on average 3 days to complete, including practical work

 

Having problems? check the errata for this course.

1

Introduction


24 m 2 s
What is Machine Learning, Supervised vs Unsupervised Learning and the Model Building Process

2

Building a Linear Regression


30 m 40 s
Assembling vectors of features and Model Fitting

3

Training Data


26 m 33 s
Training vs Test and Holdout Data, Using data from Kaggle, RMSE and R2 tests

4

Model Fitting Parameters


25 m 41 s
Setting Linear Regression Parameters

5

Feature Selection


36 m 23 s
Correlation of features, Identifying duplicate features, data preparation

6

Non Numeric Data


25 m 48 s
Using OneHotEncoding and Vectors

7

Pipelines


19 m 42 s
How to build a pipeline in SparkML

8

Case Study


34 m 51 s
A full practical exercise

9

Logistic Regression


26 m 12 s
True and False Negatives and Postives, Coding a Logistic Regression Model

10

Decision Trees


46 m 21 s
Building a decicision tree model, Interpreting a tree and Random Forests

11

Unsupervised Learning: K-Means Clustering


10 m 49 s
K-Means Clustering and how to implement in SparkML

12

Recommender Systems


29 m 7 s
Matrix Factorisation and how to build a model in SparkML

Let the Course Come to You

About Us Pricing Frequently Asked Questions Contact Privacy T&Cs Affiliates and Resellers
Facebook Twitter YouTube LinkedIn