Recurrent Convolutions

Going beyond MNIST - Build a handwritten character recognizer for your language

Posted 07/25/2016

If you’ve tried playing with Neural Nets/Deep Learning, you definitely might have come across the MNIST dataset. Its a dataset containing handwritten numbers from 0 to 9, and is one of the benchmarks used for neural nets. Find out more about it here.

We’ll be using the same concepts used to solve the MNIST problem to build a handwritten character recognizer for our own language. The first step in doing that is to find a dataset! My mother tongue is Malayalam, which has a pretty complex script. It would have been a great project, but I wasn’t able to find any good public dataset. I did find out that one of the professors from my former University has written a paper on it, need to get in touch with him and see if he can possibly share the dataset.

But I’ve got no patience, and since India is such a language rich country, there’s no need to wait! While searching for handwritten character datasets of other Indian languages, I stumbled on a beautifully compiled dataset for Devanagari script, which is used as the writing script for multiple languages including Hindi, Nepali, Marathi etc. The dataset has has a training set of 78200 examples, and a test set of 13800 examples for 46 characters.

Find it at http://cvresearchnepal.com/wordpress/dhcd/

Here’s what the dataset looks like :

devanagari-ka ka2 da1 da2 na2 na1 naa2 naa1 pa2 da2 da2 da2 da2 da2 da2 da2 da2 da2 da2 da2

Now that we’ve got the data, we need to process and convert it into a format that our machine can understand. The preprocessing is done in process_data.py file. Much of the processing code is from the first part of Google’s Deep Learning course on Udacity. After processing, we get three sets of data, the training set, validation set and testing set. Each of these sets in turn contain the Input image in the form of a numpy array of shape () and its associated “labels” of shape specifying the actual character in the image.

To build our model, we’re going to use the Keras Deep Learning library, which is a simple wrapper on top of Theano and TensorFlow and can use either one of these as its backend. Why Keras? Because it is the easiest to get started with and has a great philosophy! In later posts, we’ll reimplement the code in TensorFlow from scratch.

The essence of Deep Learning lies in the architecture of the Neural Network, ie how many layers it has, how many units in each layer etc. Let’s get started with the LeNet-5 model used for MNIST by Yan Lecun. The LeNet-5 is a Convolutional Neural Network (CNN), if you don’t know about CNNs or want a quick refresher, read this great article. There are many versions of LeNet architecture and all of them have the following basic structure.

Lenet-5

As we can see in the picture, all Neural Network models will have “layers” of various operations. The numbers, type, size etc are called Hyperparameters, and Selecting proper values for these is the essential art of building an Deep Learning model.

First, we initialize Keras model with the Sequential() function.

 model = Sequential()

Keras makes it very simple to add layers, and if you have a visual understanding of any architecture (like the LeNet picture that we have) it is very intuitive to model in Keras once you get a hang of it. Lets start with the first layer. If you look carefully at the network, all the first layer does is puts the image through a Convolutional layer and converts the 32x32 image into a set of 20 feature maps. Here’s how its done in Keras :

 model.add(Convolution2D(100, 5, 5, border_mode='valid', input_shape=(1, 32, 32)))

Next comes the subsampling layer, which can be implemented using the MaxPooling2D or AveragePooling 2D methods in Keras.

 model.add(MaxPooling2D(pool_size=(2, 2)))
 model.add(Activation('tanh'))

The next convolutional layer outputs 250 feature maps of 10x10.

 model.add(Convolution2D(250, 10, 10, border_mode='valid'))

Followed by subsampling to 16 5x5.

 model.add(MaxPooling2D(pool_size=(2, 2)))
 model.add(Activation('tanh'))

Finally, we flatten it out and connect to 2 fully connected layers, with the final layer having the exact same units as the number of classes(unique characters) in the dataset i.e. 46

 model.add(Flatten())
 model.add(Dense(1000))
 model.add(Activation('relu'))
 model.add(Dense(46))
 model.add(Activation('softmax'))

There we have our model ready! Now just a few lines of code describing the model optimization strategy, accuracy measure and loss function, we’re ready to train the model.

 model.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])
 model.fit(train_dataset, train_labels, batch_size=batch_size, nb_epoch=nb_epoch,
          verbose=1, validation_data=(valid_dataset, valid_labels))

With 12 epochs of training, I was able to get an accuracy of 98.43%, which is pretty good considering that there was no hyperparameter optimization done! There is scope for improvement with optimized parameters. In the next post, we’ll look at hyperparameter optimization using the HyperOpt library.

For the complete code, go to : github.com/psbots/devanagari-character-recognition


Written by Praveen Sridhar who lives in Kochi, India and works on Machine Learning projects in Python using the wonderful scikit-learn, TensorFlow and Keras libraries. Here's where to find him on Twitter & Github. I've been invited for Deep Learning School! Seeking your support at http://contribute.recurrentconvolutions.com