# Keras Feedforward Tutorial

## Code walk-through - Jan. 2016

This section will walk you through the code of `feedforward_keras_mnist.py`

, which I suggest you have open while reading. This tutorial is based on several Keras examples and from it’s documentation :

If you are not yet familiar with what mnist is, please spend a couple minutes there. It is basically a set of hadwritten digit images of size $\left{ 2*3 \right}$ in greyscale (0-255). There are 60,000 training examples and 10,000 testing examples. The training examples could be also split into 50,000 training examples and 10,000 validation examples.

By the way, Keras’s documentation is better and better (and it’s already good) and the community answers fast to questions or implementation problems.

#### Keras Documentation

#### Keras’s Github

# Recognizing handwritten digits with Keras

## Table of Contents

## General Organization

We start with importing everything we’ll need (no shit…). Then we define the `callback`

class that will be used to store the loss history. Lastly we define functions to load the data, compile the model, train it and plot the losses.

The overall philosophy is modularity. We use default parameters in the `run_network`

function so that you can feed it with already loaded data (and not re-load it each time you train a network) or a pre-trained network `model`

.

Also, don’t forget the Python’s `reload(package)`

function, very useful to run updates from your code without quitting (I)python.

## Imports

`time`

, `numpy`

and `matplotlib`

I’ll assume you already know.

`np_utils`

is a set of helper functions, we will only use`to_categorical`

which i’ll describe later on.`callbacks`

is quite transparent, it is a customizable class that triggers functions on events.`models`

is the core of Keras’s neural networks implementation. It is the object that represents the network : it will have layers, activations and so on. It is the object that will be ‘trained’ and ‘tested’.`Sequetial`

means we will use a ‘layered’ model, not a graphical one.`layers`

are the objects we stack on the`model`

. There are a couple ways of using them, either include the`dropout`

and`activation`

parameters in the ` Dense`layer, or treat them as`

layers`that will apply to the`

model`’s last ‘real’ layer.`optimizers`

are the optimization algorithms such as the classic Stochastic Gradient Descent. We will use`RMSprop`

(see here G. Hinton’s explanatory video and there the slides)`datasets`

(in our case) will download the mnist dataset if it is not already in`~/.keras/datasets/`

and load it into`numpy`

arrays.

## Callback

The new class `LossHistory`

extends Keras’s `Callback`

class. It basically relies on two events:

`on_train_begin`

-> the event is clear : when the training begins, the callback initiates a list`self.losses`

that will store the training losses.`on_batch_end`

-> when a batch is done propagating forward in the network : we get its loss and append it to`self.losses`

.

This callback is pretty straight forward. But you could want to make it more complicated! Remember that callbacks are simply functions : you could do anything else within these. More on callbacks and available events there.

## Loading the Data

Keras makes it very easy to load the Mnist data. It is split between train and test data, between examples and targets.

Images in mnist are greyscale so values are `int`

between 0 and 255. We are going to rescale the inputs between 0 and 1 so we first need to change types from `int`

to `float32`

or we’ll get 0 when dividing by 255.

Then we need to change the targets. `y_train`

and `y_test`

have shapes `(60000,)`

and `(10000,)`

with values from 0 to 9. We do not expect our network to output a value from 0 to 9, rather we will have 10 output neurons with `softmax`

activations, attibuting the class to the best firing neuron (`argmax`

of activations). `np_utils.to_categorical`

returns vectors of dimensions `(1,10)`

with 0s and one 1 at the index of the transformed number : `[3] -> [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]`

.

Lastly we reshape the examples so that they are shape `(60000,784)`

, `(10000, 784)`

and not `(60000, 28, 28)`

, `(10000, 28, 28)`

.

## Creating the model

Here is the core of what makes your neural network : the `model`

.

We begin with creating an instance of the `Sequential`

model. Then we add a couple hidden layers and an output layer. After that we instanciate the `rms`

optimizer that will update the network’s parameters according to the RMSProp algorithm. Lastly we compile the model with the `categorical_crossentropy`

cost / loss / objective function and the optimizer. We also state we want to see the accuracy during fitting and testing.

Let’s get into the model’s details :

- The first hidden layer is has 500 units, rectified linear unit activation function and 40% of dropout. Also, it needs the input dimension : by specifying
`input_dim = 784`

we tell this first layer that the virtual input layer will be of size 784. - The second hidden layer has 300 units, rectified linear unit activation function and 40% of dropout.
- The output layer has 10 units (because we have 10 categories / labels in mnist), no dropout (of course…) and a softmax activation function to output a probability.
`softmax`

output +`categorical_crossentropy`

is standard for multiclass classification. - This structure 500-300-10 comes from Y. LeCun’s website citing G. Hinton’s unpublished work
- Here I have kept the default initialization of weights and biases but you can find here the list of possible initializations. Also, here are the possible activations.

Remember I mentioned that Keras used Theano? well, you just went through it. Creating the `model`

and `optimizer`

instances as well as adding layers is all about creating Theano variables and explaining how they depend on each other. Then the compilation time is simply about declaring an undercover Theano function. This is why this step can be a little long. The more complex your model, the longer (captain here).

And yes, that’s it about Theano. Told you you did not need much!

## Running the network

The `try/except`

is there so that you can stop the network’s training without losing it.

With Keras, training your network is a piece of cake: all you have to do is call `fit`

on your model and provide the data.

So first we load the data, create the model and start the loss history. All there is to do then is fit the network to the data. Here are `fit`

’s arguments:

`X_train, y_train`

are the training data`nb_epoch`

is perfecty transparent and`epochs`

is defined when calling the`run_network`

function.`batch_size`

idem as`nb_epoch`

. Keras does all the work for you regarding epochs and batch training.`callbacks`

is a list of callbacks. Here we only provide`history`

but you could provide any number of callbacks.`validation_data`

is, well, the validation data. Here we use the test data but it could be different. Also you could specify a`validation_split`

float between 0 and 1 instead, spliting the training data for validation.`verbose = 2`

so that Keras displays both the training and validation loss and accuracy.

## Plot

Nothing much here, just that it is helpful to monitor the loss during training but you could provide any list here of course.

## Usage

if you do not want to reload the data every time:

Using an Intel i7 CPU at 3.5GHz and an NVidia GTX 970 GPU, we achieve 0.9847 accuracy (1.53% error) in 56.6 seconds of training using this implementation (including loading and compilation).