Course curriculum

  • 1

    Introduction

    • Course Introduction

    • Course Structure

    • Is this Course Right for You?

  • 2

    Foundations

    • Introduction to Data Preparation

    • The Machine Learning Process

    • Data Preparation Defined

    • Choosing a Data Preparation Approach

    • What is Data

    • What is Raw Data?

    • Machine Learning is Mostly Data Preparation

    • Common Data Preparation Tasks - Data Cleansing

    • Common Data Preparation Tasks - Feature Selection

    • Common Data Preparation Tasks - Data Transforms

    • Common Data Preparation Tasks - Feature Engineering

    • Common Data Preparation Tasks - Dimensionality Reduction

    • Data Leakage

    • Problem With Naïve Data Preparation

    • Case Study: Data Leakage: Train/Test/Split Naïve Approach

    • Case Study: Data Leakage: Train/Test/Split Correct Approach

    • Case Study: Data Leakage: K-Fold Naïve Approach

    • Case Study: Data Leakage: K-Fold Correct Approach

  • 3

    Data Cleansing

    • Data Cleansing Overview

    • Identify Columns That Contain a Single Value

    • Identify Columns with Few Values

    • Remove Columns with Low Variance

    • Identify and Remove Rows That Contain Duplicate Data

    • Defining Outliers

    • Remove Outliers - The Standard Deviation Approach

    • Remove Outliers - The IQR Approach

    • Automatic Outlier Detection

    • Mark Missing Values

    • Remove Rows with Missing Values

    • Statistical Imputation

    • Mean Value Imputation

    • Simple Imputer with Model Evaluation

    • Compare Different Statistical Imputation Strategies

    • K-Nearest Neighbors Imputation

    • KNNImputer and Model Evaluation

    • Iterative Imputation

    • IterativeImputer and Model Evaluation

    • IterativeImputer and Different Imputation Order

  • 4

    Feature Selection

    • Feature Selection Introduction

    • Feature Selection Defined

    • Statistics for Feature Selection

    • Loading a Categorical Dataset

    • Encode the Dataset for Modeling

    • Chi-Squared

    • Mutual Information

    • Modeling with Selected Categorical Features

    • Feature Selection with ANOVA on Numerical Input

    • Feature Selection with Mutual Information

    • Modeling with Selected Numerical Features

    • Tuning Number of Selected Features

    • Select Features for Numerical Output

    • Linear Correlation with Correlation Statistics

    • Linear Correlation with Mutual Information

    • Baseline and Model Built Using Correlation

    • Model Built Using Mutual Information Features

    • Tuning Number of Selected Features

    • Recursive Feature Elimination

    • RFE for Classification

    • RFE for Regression

    • RFE Hyperparameters

    • Feature Ranking for RFE

    • Feature Importance Scores Defined

    • Feature Importance Scores: Linear Regression

    • Feature Importance Scores: Logistic Regression and CART

    • Feature Importance Scores: Random Forests

    • Permutation Feature Importance

    • Feature Selection with Importance

  • 5

    Data Transforms

    • Scale Numerical Data

    • Diabetes Dataset for Scaling

    • MinMaxScaler Transform

    • StandardScaler Transform

    • Robust Scaling Data

    • Robust Scaler Applied to Dataset

    • Explore Robust Scaler Range

    • Nominal and Ordinal Variables

    • Ordinal Encoding

    • One-Hot Encoding Defined

    • One-Hot Encoding

    • Dummy Variable Encoding

    • OrdinalEncoder Transform on Breast Cancer Dataset

    • Make Distributions More Gaussian

    • Power Transform on Contrived Dataset

    • Power Transform on Sonar Dataset

    • Box-Cox on Sonar Dataset

    • Yeo-Johnson on Sonar Dataset

    • Polynomial Features

    • Effect of Polynomial Degrees

  • 6

    Advanced Transforms

    • Transforming Different Data Types

    • The ColumnTransformer

    • The ColumnTransformer on Abalone Dataset

    • Manually Transform Target Variable

    • Automatically Transform Target Variable

    • Challenge of Preparing New Data for a Model

    • Save Model and Data Scaler

    • Load and Apply Saved Scalers

  • 7

    Dimensionality Reduction

    • Curse of Dimensionality

    • Techniques for Dimensionality Reduction

    • Linear Discriminant Analysis

    • Linear Discriminant Analysis Demonstrated

    • Principal Component Analysis