Simple Neural Networks With Numpy

Simple Neural Networks With Numpy

Introduction

In this post I will be implementing the tanh function in a simple 2-layer neural network. I will try to go in depth explaining what is happening within the neural network and why it seems to magically work. At the end I will show how to get the same results with Scikit-learn and compare our results. This post is inspired by Andrew Trask’s intuitive blog post about neural networks. While learning about machine learning and deep learning, Andrew Trask’s blog has been one of my most frequently used resources and I would recommend it to everyone trying to learn about this field.

What is a Neural Network?

Artificial neural networks are biology based algorithms that are inspired by and designed to work like, the human brain. Similar to the human brain’s neurons, artificial neural networks consist of artificial neurons, or nodes, that take in information, apply an operation to the information, then pass the information off to the next neuron. These neurons are all connected in a way similar to the synapses in the brain. These artificial synapses are all weighted differently according to the strength of their influence to the next neuron.

To start, the neural network example used in this blog post will start with randomized synaptic weights. Given time to learn, these weights will be adjusted to produce more accurate results. This is where backpropagation comes into play. Backpropagation goes back through the layers of the neural network and adjusts the weights according to their contribution to the error of the outputs.

Neural networks require data to “learn” from, similar to the human brain. The more data there is to learn from, the better the results. I plan on writing another post soon to go into the specifics of neural networks.

Neural networks will typically consist of an input layer with one or more hidden layers and an output layer. The input layer takes in the data that it will learn from. This data then feeds forward to the next layer, called the hidden layer. The learning occurs in the hidden layer(s). Hidden layers apply a function to the input data or previous hidden layer. In this blog post, the hidden layer will be applying the tanh function to the data. Hidden layers will make the data usable to the output layer before it passes the data on.

This diagram below shows the layers of the neural network and how the data passes through it. You can see that the data will start in the input layer, travel to the hidden layers through the weighted synapse, then travel to the output layer. The arrows represent the flow of the data.

Goal of The Neural Network

What is the neural network trying to accomplish? This example of a simple neural network will be using a dataset that consists of 0s and 1s. Each training example will have a combination of three 0s and 1s that corresponds to one output, also a 0 or 1.

Inputs Outputs
0, 0, 1 0
0, 1, 1 0
1, 0, 1 1
1, 1, 1 1

Can you see the pattern? The output to the training example is the same as the first digit in the training example.

How It Works (Simplified)

(1) The neural network will first take in the inputs.

(2) The inputs will be multiplied by their weights.

(3) The products of the inputs and weights will be summed together.

(4) The sum of the products of the inputs and weights will be fed into the tanh function. This is the hidden layers prediction.

(5) The error will be calculated by taking the difference of the correct output and the neural networks predicted output.

(6) The error will be multiplied by the derivative of the hidden layer. This gives us the hidden delta.

(7) The synaptic weight will then be updated based of the input data and the hidden delta.

For this post, some portions of the code will be interactive while some will not be. I feel like playing with the code is the best way to learn what is really happening. Here is the code we will be working with:

import numpy as np

def tanh(x, deriv = False):
    if deriv == True:
        return (1 - (tanh(np.exp2(2) * x)))
    return (np.exp(2*x) - 1) / (np.exp(2*x) + 1)

input_layer = np.array([[0,0,1], [0,1,1], [1,0,1], [1,1,1]])            
outputs = np.array([[0, 0, 1, 1]]).T

np.random.seed(1)

synapse = np.random.random((3, 1)) - 1

for iter in range(10000):
    hidden_layer = tanh(np.dot(input_layer, synapse))
    hidden_error = outputs - hidden_layer
    hidden_delta = hidden_error * tanh(hidden_layer, True)
    synapse += np.dot(input_layer.T, hidden_delta)

How Does it Work?

I will post snippets of the code and break down what is happening and why it is happening. The first line is where we will import Numpy. Numpy is a linear algebra library that we will be using frequently throughout the code.

import numpy as np
Now we will have to build our tanh function, or hyperbolic tangent function. The hyperbolic tangent is similar to the normal tangent. Going back to trigonometry, we know that the tangent of x is equal to the sine of x over the cosine of x, written as:


Similarly, the hyperbolic tangent (tanh) of x is equal to the hyperbolic sine (sinh) of x over the hyperbolic cosine (cosh) of x, or:


The cosh(x) is represented by this formula:


And the sinh(x) is represented by this formula:


This means that tanh is represented by either one of these two formulas:



Another important part of our neural network is the derivative, or slope, of the tanh function, which can be written as:

Why The Tanh Activation Function?

The tanh function, common among simpler neural networks, is a non-linear activation function which helps with generalization and fitting to data. The tanh function is similar to the commonly used sigmoid function, otherwise known as the logistic function. The tanh function shares a similar sigmoidal shape, or “S” shape, to the logistic function. Tanh ranges from -1 to 1 while the logistic function ranges from 0 to 1. This is part of the reason that the tanh function is preferred over the logistic function. The tanh function has data more centered around zero and therefore has stronger gradients. Tanh can also be negative, providing a greater area for improvement, whereas the logistic function cuts off at zero and is restricted compared to the tanh function.

Define The Tanh Function

The equations for the tanh function and its derivative are above. Either tanh formula will work. In this example I will be using:

return (np.exp(2*x) - 1) / (np.exp(2*x) + 1)

But if you wanted to switch things up you could use the alternative equation. This would be written like this:

return (np.exp(x) - np.exp(-x)) / (np.exp(x) + np.exp(-x))

If you want to shorten the code then you can also use Numpy’s tanh function. While this is of course simpler, I will not be using it because I prefer the equations to be written out for this example. In this post I want to make understanding the inner workings of a neural network as intuitive as possible. I feel like using numpy.tanh() has a potential to take away from that experience. Feel free to use which ever option feels best for you.

Like the tanh function that we wrote ourselves, Numpy’s tanh function takes in an input array (x) and returns the corresponding tanh values. Numpy’s tanh function can be used in the code like so:

return np.tanh(x)

Now to put all those equations into a code format, we define our tanh function with two arguments. The first argument being “x”, or the input that will take the place of “x” in our equation. The second argument being deriv = False. We will use the derivative of the tanh function later on, so we want to be able to tell our function whether it should use the derivative equation or not. If we call deriv = True, then the tanh function will return the derivative, but if we do not specify the derivative as true, the function will assume it to be false and will return the tanh function.

All together, the tanh function should look something like this:

def tanh(x, deriv = False):
    if deriv == True:
        return (1 - (tanh(np.exp2(2) * x)))
    return (np.exp(2*x) - 1) / (np.exp(2*x) + 1)
  # return (np.exp(x) - np.exp(-x)) / (np.exp(x) + np.exp(-x))
  # return np.tanh(x)

Create The Dataset

Now we will be creating our input and output data. Our input matrix will be called “input_layer” and our outputs will be appropriately names “outputs”.

Remember: If the input data example starts with a zero, then the output will be a zero. If the input data example starts with a one, then the output will be a one as well.

We will be using numpy’s arrays to help create our data. For the input data, each set of three numbers is a training example. Because there are three numbers in each training example, there are three input nodes in our neural network. Each training example has a corresponding output. The output is a single number (a prediction between 0 and 1), therefore we have one output node in our neural network.

input_layer = np.array([[0,0,1], [0,1,1], [1,0,1], [1,1,1]])            
outputs = np.array([[0, 0, 1, 1]]).T

At the end of the line where we create the outputs there is a “.T”. This is the transpose function that will flip a matrix over its diagonal. In this case, the line of code creates a matrix with one row and four columns. Once the transpose function is applied, the outputs matrix will flip to a one column, four row matrix. That matches our four training examples and one output node. Here is an example that prints out the outputs before and then after the transpose function is applied. Hopefully this helps understand what is going on.

# TRANSPOSE
print('Before Transpose Function: ', outputs)
print('After Transpose Function: \n', outputs.T)
Before Transpose Function:  [[0 0 1 1]]
After Transpose Function:
 [[0]
 [0]
 [1]
 [1]]

Random Seed

np.random.seed(1)

Seeding the random numbers means that even though the numbers will be generated randomly, it will be generated the same “random” way each time you run the code. To make this more intuitive or more confusing, imagine that “.seed(1)” was a predetermined set of “random” numbers. Then “.seed(2)” will have another set of random numbers but different than “.seed(1)”. Every time you run “.seed(1)”, you will get the same random set of numbers. This makes it easier to compare results after running the code a few times. You can seed the numbers to any number that you want, but keep it the same each time you run it.

Creating The Synaptic Weights

synapse = np.random.random((3, 1)) - 1

This is where we make the artificial synapse for this 2-layer neural network. This will create a randomized weight matrix that is our synaptic weights. The synapse has 3 inputs and one output that need to become connected. Every time the code is run, the weights will be initialized with the same values. This is because we seeded our random numbers in the line of code above.

To view the random weights that the neural network will start with, you can simply print out the values in the synapse. Hit “run” in the code to view the weights that will be used in this neural network. Try playing around with the value in np.random.seed(1) to get different random weight values.

Training

for iter in range(10000):

    hidden_layer = tanh(np.dot(input_layer, synapse))

    hidden_error = outputs - hidden_layer

    hidden_delta = hidden_error * tanh(hidden_layer, True)

    synapse += np.dot(input_layer.T, hidden_delta)

for iter in range(10000):

The training begins here. This for loop will run through this process however many times it is specified to do so. In this case we want to run through 10,000 training iterations.

hidden_layer = tanh(np.dot(input_layer, synapse))

The hidden layer will be calculated by taking the sum of the products of the inputs and their weights.

Where X is the input and W is the weight.

The sum of the product of the input and weight then becomes the “x” value for the tanh function.


hidden_error = outputs - hidden_layer

The error will then be calculated by taking the difference of the correct outputs and the hidden layers prediction.

hidden_delta = hidden_error * tanh(hidden_layer, True)

We will then do some more calculations for the “hidden_delta”. First, notice that the deriv = False argument has now been set to “True”. This means that our tanh function will now use the derivative of the hidden layer. This will then be multiplied by the error that was calculated in the line above. This will help minimize the error for future iterations.

synapse += np.dot(input_layer.T, hidden_delta)

The final step in the training process is updating the synaptic weights. The weights will be updated by using numpys “.dot()” function on the input layer after the transpose function has been applied and the hidden delta. The new synaptic weight is the old weight plus the product of the transposed inputs and hidden delta.

The neural network itself is now finished and is ready to learn.

Predicting

We can add one more step to test out the neural network that we have created. To predict the output to a new set of inputs we can create a “predict” function. This predict function will take in our new inputs and will spit out its predicted output based on its previous training.

def predict(new_input):
    return tanh(np.dot(new_input, synapse))

Remember: The first number in the training examples was what determined the output. Also, keep in mind that the training was done with 0s and 1s. Other numbers will obviously not work.

Next we can create our “test dataset”, or test_data, exactly like how we made our training data. Instead of writing the input data on one line, I wrote the test data on multiple lines to show the matrix it creates.

test_data = np.array([ [1,1,0],
                       [0,1,0],
                       [1,0,0] ])
# or --> test_data = np.array([ [1,1,0], [0,1,0], [1,0,0] ])

After testing our neural network, we can loop through each test example and print out the predicted output. The neural network will not be 100% confident in its prediction and therefore will not say whether the output is 0 or 1. Instead, the neural network will spit out a number that is close to 1 or close to 0. The closer the number is to 1 or 0, the more confident the neural network is.

for test_example in test_data:
        print('Predicted Output For:', test_example,'=', predict(test_example), '\n')

All Together + Test Results

import numpy as np

# create the tanh function with an optional derivative
def tanh(x, deriv = False):
    if deriv == True:
        return (1 - (tanh(np.exp2(2) * x)))
    return (np.exp(2*x) - 1) / (np.exp(2*x) + 1)

# create dataset
input_layer = np.array([[0,0,1], [0,1,1], [1,0,1], [1,1,1]])   
outputs = np.array([[0, 0, 1, 1]]).T

np.random.seed(1)

# itialize the random synaptic weights
synapse = np.random.random((3, 1)) - 1

# training process
for iter in range(10000):
    hidden_layer = tanh(np.dot(input_layer, synapse))

    hidden_error = outputs - hidden_layer

    hidden_delta = hidden_error * tanh(hidden_layer, True)

    synapse += np.dot(input_layer.T, hidden_delta)

# prediction for new inputs  
def predict(new_input):
    return tanh(np.dot(new_input, synapse))

# new test data (examples not used in training dataset)
test_data = np.array([ [1,1,0],
                       [0,1,0],
                       [1,0,0] ])

# print predicted results for each new test example
for test_example in test_data:
        print('Predicted Output For:', test_example,'=', predict(test_example), '\n')
Predicted Output For: [1 1 0] = [0.99999874]

Predicted Output For: [0 1 0] = [-0.01759575]

Predicted Output For: [1 0 0] = [0.99999878]

Try experimenting with the code below to help understand what is happening.

Neural network for the minimalist:

import numpy as np
def tanh(x,deriv=False):
    if deriv==True:
        return (1-(tanh(np.exp2(2)*x)))
    return (np.exp(2*x)-1)/(np.exp(2*x)+1)
input_layer = np.array([[0,0,1],[0,1,1],[1,0,1],[1,1,1]])            
outputs = np.array([[0,0,1,1]]).T
synapse = np.random.random((3,1))
for iter in range(10000):
    hidden_layer = tanh(np.dot(input_layer,synapse))
    hidden_delta = (outputs-hidden_layer)*tanh(hidden_layer,True)
    synapse += np.dot(input_layer.T,hidden_delta)

Using Scikit-learn For Fun

Below is just another way to get the same results with Scikit-learn. The neural network we created using only Numpy took up about thirteen lines, but if we use Scikit-learn we can bring that count down to about five lines of code if we don’t count importing libraries and printing the predictions. I think it’s interesting to see and implement all the little details that make a neural network work and then comparing it with a higher level machine learning library. Check out the results! I would say we did pretty well!

import numpy as np
import sklearn

input_layer = np.array([[0,0,1], [0,1,1], [1,0,1], [1,1,1]])   
outputs = np.ravel(np.array([[0, 0, 1, 1]]).T)

MLP = sklearn.neural_network.MLPClassifier
MLP = MLP(activation='tanh', solver='sgd', random_state=1, max_iter=10000)

MLP.fit(input_layer, outputs)

test_data = np.array([ [1,1,0], [0,1,0], [1,0,0] ])

for example in range(3):
    print('Prediction For: ', test_data[example],
          '=', MLP.predict(test_data)[example])
Prediction For:  [1 1 0] = 1
Prediction For:  [0 1 0] = 0
Prediction For:  [1 0 0] = 1