You will learn how to:
- Implement a 2-class classification neural network with a single hidden layer
- Use units with a non-linear activation function, such as tanh
- Compute the cross entropy loss
- Implement forward and backward propagation
Let’s first import all the packages that you will need during this assignment.
- numpy is the fundamental package for scientific computing with Python.
- sklearn provides simple and efficient tools for data mining and data analysis.
- matplotlib is a library for plotting graphs in Python.
- testCases provides some test examples to assess the correctness of your functions.
- planar_utils provide various useful functions used in this assignment.
# Package imports
First, let’s get the dataset you will work on. The following code will load a “flower” 2-class dataset into variables X and Y.
X, Y = load_planar_dataset()
Visualize the dataset using matplotlib. The data looks like a “flower” with some red (label y=0) and some blue (y=1) points. Your goal is to build a model to fit this data.
# Visualize the data:
How many training examples do you have? In addition, what is the shape of the variables X and Y?
### START CODE HERE ### (≈ 3 lines of code)
Before building a full neural network, lets first see how logistic regression performs on this problem. You can use sklearn’s built-in functions to do that.
# Train the logistic regression classifier
You can now plot the decision boundary of these models.
# Plot the decision boundary for logistic regression
The dataset is not linearly separable, so logistic regression doesn’t perform well. Hopefully a neural network will do better. Let’s try this now!
Logistic regression did not work well on the “flower dataset”. You are going to train a Neural Network with a single hidden layer.
The general methodology to build a Neural Network is to:
- Define the neural network structure ( # of input units, # of hidden units, etc).
- Initialize the model’s parameters
- Implement forward propagation
- Compute loss
- Implement backward propagation to get the gradients
- Update parameters (gradient descent)
Define three cariables:
- n_x: the size of the input layer
- n_h: the size of the hidden layer (set this to 4)
- n_y: the size of the output layer
# GRADED FUNCTION: layer_sizes
- Make sure your parameters’ sizes are right. Refer to the neural network figure above if needed.
- You will initialize the weights matrices with random values.
- Use: np.random.randn(a,b) * 0.01 to randomly initialize a matrix of shape (a,b).
- You will initialize the bias vectors as zeros.
- Use: np.zeros((a,b)) to initialize a matrix of shape (a,b) with zeros.
# GRADED FUNCTION: initialize_parameters
- Look above at the mathematical representation of your classifier.
- You can use the function sigmoid(). It is built-in (imported) in the notebook.
- You can use the function np.tanh(). It is part of the numpy library.
- The steps you have to implement are:
- Retrieve each parameter from the dictionary “parameters” (which is the output of initialize_parameters()) by using parameters[“..”].
- Implement Forward Propagation. Compute Z,A,Z and A (the vector of all your predictions on all the examples in the training set).
- Values needed in the backpropagation are stored in “cache”. The cache will be given as an input to the backpropagation function.
# GRADED FUNCTION: forward_propagation
Implement compute_cost() to compute the value of the cost J.
# GRADED FUNCTION: compute_cost
Using the cache computed during forward propagation, you can now implement backward propagation.
Implement the function backward_propagation().
Backpropagation is usually the hardest (most mathematical) part in deep learning. To help you, here again is the slide from the lecture on backpropagation. You’ll want to use the six equations on the right of this slide, since you are building a vectorized implementation.
# GRADED FUNCTION: backward_propagation
Implement the update rule. Use gradient descent. You have to use (dW1, db1, dW2, db2) in order to update (W1, b1, W2, b2).
General gradient descent rule: θ=θ−α*(∂J/∂θ) where α is the learning rate and θ represents a parameter.
# GRADED FUNCTION: update_parameters
Build your neural network model in nn_model(). The neural network model has to use the previous functions in the right order.
# GRADED FUNCTION: nn_model
# GRADED FUNCTION: predict
It is time to run the model and see how it performs on a planar dataset. Run the following code to test your model with a single hidden layer of $n_h$ hidden units.
# Build a model with a n_h-dimensional hidden layer
# Print accuracy
Accuracy is really high compared to Logistic Regression. The model has learnt the leaf patterns of the flower! Neural networks are able to learn even highly non-linear decision boundaries, unlike logistic regression.
Run the following code. It may take 1-2 minutes. You will observe different behaviors of the model for various hidden layer sizes.
# This may take about 2 minutes to run
If you want, you can rerun the whole notebook (minus the dataset part) for each of the following datasets.
You’ve learnt to:
- Build a complete neural network with a hidden layer
- Make a good use of a non-linear unit
- Implemented forward propagation and backpropagation, and trained a neural network
- See the impact of varying the hidden layer size, including overfitting.