The Google Certified Data Engineer
This course is a complete end-to-end solution for preparing for and passing the Google Certified Data Engineer.
What Do I Need to Take for The Data Engineering Certification?
An Introduction to Machine Learning
Modeling in Python
Data Cleansing
Basic Algorithm (Model) Types
The Perceptron
Simple Neural Network
Introduction
Is this Course for You?
Instructor Q & A
Cloud Platform Resource Hierarchy
The Exam Case Study
Big Data Services
Section Summary
Data Engineering Cheat Sheet
Creating an Account on GCP
Navigating the GCP Console
Creating a Project
Billing at a High Level
IAMS
Cloud Shell
API Mangement
Installing GCP SDK
Section Summary
Compute Services High Level
Working with Compute Engine
Cloud Launcher
Compute Engine Resources
What's Docker?
App Engine
Deploy Docker Container
Summary
Key Storage Terms
Storage Classes
Buckets
Working with Objects in Our Buckets
An Introduction to gsutil
Summary
Cloud SQL Introduction
MySQL Client in Cloudshell
Creating and Exporting Schema
Cloud SQL Backups
Creating a Cloud SQL Instance
Summary
What is BigQuery?
What is BigTable?
Cloud Datastore
The Pub/Sub Demo
What is Cloud Datastore?
What is Pub/Sub?
BigQuery Demo
Cloud Dataproc Demo
BigTable Demo
What is Tensorflow?
What is a Datalab?
Cloud Datalab Demo
Section Summary
Introduction
Is This Course Right for You?
The 4 Types of Data
Structured Versus Unstructured Data
Instructor Q&A
Section Summary
Why Use Cloud Dataproc?
On Premise Hadoop Buildout
Scaling Up or Out
Decouple Storage and Compute
Regions and Zones
Cloud Dataproc Architecture
Section Summary
The Cluster Creation Screen
Create Cluster Using Console
Create Cluster with Command Line
The 3 Cluster Options
Preemptible Worker Nodes
How Preemption Works
Image Versions
Custom Image
Custom Dataproc Cluster
Install Software on Dataproc Clusters
Add Initialization Actions
Cluster High Availability
Scaling Clusters
Section Summary
The Submit Jobs Screen
Submit Spark Job to Cluster
Submit PySpark Job vis SSH
Hadoop Jobs to GCP in 3 Steps
Scala and Python Jobs to GCP
WebHD_720p (4)
WebHD_720p (3)
WebHD_720p (5)
WebHD_720p (2)
WebHD_720p
WebHD_720p (6)
WebHD_720p (1)
Whiteboarding: On-Preminse to Cloud Dataproc
Whiteboarding: Moving Jobs to GCP
White Boarding: Data and Compute in the Same Zones
Whiteboarding: Defining Preemption
White Boarding: On-Premise Jobs to GCP Architecture
White Boarding: Adding Custom Software to Nodes
Section Summary
Introduction
Is this Course Right for You?
What is Streaming?
The Three Vs of Data
The Beam Pipeline
Section Summary
Definition and History
Beam Object Model
Pipeline Object Review
Pipeline Object Review Answer Key
Event Time and Processing Time
Windowing
Use Case: The Mobil App
Handling Data Tensions
MapReduce
FlumeJava and Batch Patterns
MillWheel
Event Skew
Section Summary
Cloud Dataflow: The SDK and the Runner
The 4 Core Questions of Dataflow
Lab: Building a Dataflow Pipeline
Dataflow Job Monitoring UI
Stack Driver and Dataflow
Simple Dashboard
Lab: Monitoring Dataflow
Section Summary
Course Introduction
Exam Preparation Tip
Is This Course Right for You?
Exam Tip
What's an Array?
What's a Tensor?
The "FLOW" in Tensorflow
Number Moving Through Graph
Hello World in Tensorflow
Section Summary
Creating a Jupyter Notebook on GCP
Reconnect Datalab to Virtual Machine
Download and Upload Notebooks to Datalab
Up and Running with Cloud Datalab
Summary
The Tensorflow Code Base
Forward Feeding Graphs
Handling Iteration in Tensorflow Graphs
Steps in Every Tensorflow Program
Modeling Larger Computational Graphs
Resizing After High Utilization Warning
Simple End-to-End Example
Tensor Dimensions
Placeholders
Sessions
Node Life Cycle
Properties of a Tensor
Convert to Tensors
Enabling Logging with Tensorflow
Lab: hello World in Tensorflow
Section Summary
Numpy vs Tensorflow
Data Scrubbing
Data Import and Exploration
Linear Regression in Tensorflow
The Mandelbrot Set
Overfitting and How to Correct it.
Packaging Up Our Model
Creating a Server Input Function
Lab: Linear Regression in TensorFlow
Linear Regression Lab Walk Through
Cloud Machine Learning at Scale
Section Summary