Course curriculum

    1. Course Introduction

    2. Course Structure

    3. Is this Course Right for You?

    1. Introduction to Data Preparation

    2. The Machine Learning Process

    3. Data Preparation Defined

    4. Choosing a Data Preparation Approach

    5. What is Data

    6. What is Raw Data?

    7. Machine Learning is Mostly Data Preparation

    8. Common Data Preparation Tasks - Data Cleansing

    9. Common Data Preparation Tasks - Feature Selection

    10. Common Data Preparation Tasks - Data Transforms

    11. Common Data Preparation Tasks - Feature Engineering

    12. Common Data Preparation Tasks - Dimensionality Reduction

    13. Data Leakage

    14. Problem With Naïve Data Preparation

    15. Case Study: Data Leakage: Train/Test/Split Naïve Approach

    16. Case Study: Data Leakage: Train/Test/Split Correct Approach

    17. Case Study: Data Leakage: K-Fold Naïve Approach

    18. Case Study: Data Leakage: K-Fold Correct Approach

    1. Data Cleansing Overview

    2. Identify Columns That Contain a Single Value

    3. Identify Columns with Few Values

    4. Remove Columns with Low Variance

    5. Identify and Remove Rows That Contain Duplicate Data

    6. Defining Outliers

    7. Remove Outliers - The Standard Deviation Approach

    8. Remove Outliers - The IQR Approach

    9. Automatic Outlier Detection

    10. Mark Missing Values

    11. Remove Rows with Missing Values

    12. Statistical Imputation

    13. Mean Value Imputation

    14. Simple Imputer with Model Evaluation

    15. Compare Different Statistical Imputation Strategies

    16. K-Nearest Neighbors Imputation

    17. KNNImputer and Model Evaluation

    18. Iterative Imputation

    19. IterativeImputer and Model Evaluation

    20. IterativeImputer and Different Imputation Order

    1. Feature Selection Introduction

    2. Feature Selection Defined

    3. Statistics for Feature Selection

    4. Loading a Categorical Dataset

    5. Encode the Dataset for Modeling

    6. Chi-Squared

    7. Mutual Information

    8. Modeling with Selected Categorical Features

    9. Feature Selection with ANOVA on Numerical Input

    10. Feature Selection with Mutual Information

    11. Modeling with Selected Numerical Features

    12. Tuning Number of Selected Features

    13. Select Features for Numerical Output

    14. Linear Correlation with Correlation Statistics

    15. Linear Correlation with Mutual Information

    16. Baseline and Model Built Using Correlation

    17. Model Built Using Mutual Information Features

    18. Tuning Number of Selected Features

    19. Recursive Feature Elimination

    20. RFE for Classification

    21. RFE for Regression

    22. RFE Hyperparameters

    23. Feature Ranking for RFE

    24. Feature Importance Scores Defined

    25. Feature Importance Scores: Linear Regression

    26. Feature Importance Scores: Logistic Regression and CART

    27. Feature Importance Scores: Random Forests

    28. Permutation Feature Importance

    29. Feature Selection with Importance

    1. Scale Numerical Data

    2. Diabetes Dataset for Scaling

    3. MinMaxScaler Transform

    4. StandardScaler Transform

    5. Robust Scaling Data

    6. Robust Scaler Applied to Dataset

    7. Explore Robust Scaler Range

    8. Nominal and Ordinal Variables

    9. Ordinal Encoding

    10. One-Hot Encoding Defined

    11. One-Hot Encoding

    12. Dummy Variable Encoding

    13. OrdinalEncoder Transform on Breast Cancer Dataset

    14. Make Distributions More Gaussian

    15. Power Transform on Contrived Dataset

    16. Power Transform on Sonar Dataset

    17. Box-Cox on Sonar Dataset

    18. Yeo-Johnson on Sonar Dataset

    19. Polynomial Features

    20. Effect of Polynomial Degrees

    1. Transforming Different Data Types

    2. The ColumnTransformer

    3. The ColumnTransformer on Abalone Dataset

    4. Manually Transform Target Variable

    5. Automatically Transform Target Variable

    6. Challenge of Preparing New Data for a Model

    7. Save Model and Data Scaler

    8. Load and Apply Saved Scalers

About this course

  • Free
  • 103 lessons
  • 3.5 hours of video content