Convolutional Neural Networks(also known as CNNs or ConvNets), have been very successful at tasks like Image recognition
This post shows practical implementation of ConvNets using TensorFlow. We will not be reinventing the wheel and putting up another post explaining its working. That job has been very well done by the following posts, so do check them out :
Now, if you already know what “Convolutional layers” and “Max-Pooling layers” mean, you’re good to read further, otherwise do make use of the excellent explanations linked above.
Some basic familiarity with TensorFlow computational graphs is assumed. Take a look at this excellent video by Siraj, which will get you up and running with TensorFlow in 5 minutes! You might want to subscribe to his channel, he has some really good videos on a variety of topics in Deep Learning.
So, Let’s get started by importing the required libraries
import tensorflow as tf
We’ll be using the CIFAR10 dataset. The CIFAR10 is a classic benchmark dataset for Image classification, and almost every paper you find regarding ConvNets, will most probably have evaluated their model with CIFAR10 dataset, amongst others.
We’ll use cifar10.py script taken from the tflearn project here. We first load the dataset, and then do some basic preprocessing steps. Each of the image in the dataset belongs to one of the 10 classes. The to_categorical function embeds this information as a one-hot vector.
Each image in the dataset is of the form 32(Height) x 32(Width) x 3(Channel), with one channel each for Red, Green and Blue (RGB). Using the numpy reshape method, the image is flattened to a one-dimensional array of size 32x32x3 = 3072.
The number of training and testing examples are obtained using the numpy shape method.
import cifar10 (X_train, Y_train), (X_test, Y_test) = cifar10.load_data() X_train, Y_train = shuffle(X_train, Y_train) Y_train = to_categorical(Y_test, 10) Y_test = to_categorical(Y_test, 10) ntrain = X_train.shape ntest = X_test.shape X_train = X_train.reshape(-1,3072) X_test = X_test.reshape(-1,3072)
Now comes the real action, lets start defining the ConvNet. First we define the input and output dimensions of the network and create placeholder variables.
n_input = 3072 # dimension of the flattened image is the input to the network n_output = 10 # total discrete classes, 10 for cifar10 x = tf.placeholder(tf.float32, [None, n_input]) y = tf.placeholder(tf.float32, [None, n_output])
Next, the input x is reshaped into 32(height) x 32(width) x 3(channel) format using tf.reshape(). TensorFlow ConvNets accept input in the form of [image sample number, height, width, channel], where the image sample number is the current example of image being fed into the network. You might be thinking why we flattened the image in the first place if we want the 32x32x3 format back, well just follow along, because soon we’ll be coming up a good way to structure our code and this will be helpful.
x = tf.reshape(x, shape=[-1, 32, 32, 3])
[post unfinished, more to come soon]
Written by Praveen Sridhar who lives in Kochi, India and works on Machine Learning projects in Python using the wonderful scikit-learn, TensorFlow and Keras libraries. Here's where to find him on Twitter & Github. I've been invited for Deep Learning School! Seeking your support at http://contribute.recurrentconvolutions.com