Convolutional Neural Networks in Tensorflow

Prashant Goyal
12 min readJun 19, 2019

I have been taking the deeplearning.ai Deep Learning Specialization on Coursera. While learning, I realized something. I am completing the projects and quizzes and am passing with great marks. But, I have not actually been able to understand everything enough to apply in real life projects. Sure, I am gaining theoretical knowledge and I am definitely more knowledgable than I was before taking the specialization, but I wanted to make the most out of my efforts. Hence, I decided to start applying everything I learned and posting my experience in the form of Medium articles. Therefore, this is one of the first articles in my upcoming series of articles where I will be working on Deep Learning problems and try to make sure that the readers are more knowledgable about various techniques used in the world of Deep Learning.

What is Tensorflow?

TensorFlow is a Python-friendly open source library for numerical computation that makes machine learning faster and easier¹

I will not be going into the details of the Tensorflow library as this is out of the scope of this article. For unfamiliar readers, I would suggest reading more about it to get the gist of it.

Here is an article for those who want to know about Tensorflow in general: https://www.infoworld.com/article/3278008/what-is-tensorflow-the-machine-learning-library-explained.html

For those who want to get into more detail: https://www.analyticsvidhya.com/blog/2017/03/tensorflow-understanding-tensors-and-graphs/

What is a Convolutional Neural Network?

Convolutional Neural Network is a subclass of Deep Neural Network mostly applied to Computer Vision tasks. Though, They are not limited to this domain.

A typical CNN example. Source

On a basic level, we can say that a CNN takes an image as an input, assigns priorities (or weights) to various features in the image and is able to uniquely identify the image.

A CNN is able to successfully capture the Spatial and Temporal information from the image by the use of filters. Now, what is a filter? Let me use a classic example to explain. Imagine shining light from a torch on a part of the image. That part gets more bright and we are able to see it more clearly and therefore, understand what is in it better. This is very similar for a CNN. A CNN uses filters on various parts of the images to detect low-level features like edges from the image. In subsequent Convolutional layers, these filters are used to detect more advanced features like shapes, etc. These are called feature maps and CNN has the ability to train itself to look for the relevant features in the input data by itself. Pretty cool, right?

Another important part of CNN is the Pooling layer. This layer is actually concerned with the preservation of spatial information from the images. Feature maps are sensitive to the location of features in the input. What a pooling layer does is summarize the presence of features in groups of feature maps. Mathematically, they do so by taking the maximum or the average of all the values in a feature map as an input for subsequent layers. These are the 2 most popular Pooling layers called the Max Pooling and Average Pooling respectively.

I will limit the explanation of CNN as this article is more focused towards its implementation. For more details, refer to this article: https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/

And this too: https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53

A Convolutional Neural Network in Tensorflow

This is what everyone has been waiting for. Even I was pretty excited while I started working on this project. Let us dive in and get our hands dirty!

For the dataset, I am going to use the CIFAR-10 dataset. The dataset is comprised of 50000 train images and 10000 test images. The images are divided into 10 categories with equal distribution.

Importing the required libraries.

import matplotlib.pyplot as plt
# To display images inline with the Jupyter Notebook.
%matplotlib inline
import numpy as np# For downloading and extracting the .tar files in the dataset
import os
import tarfile
import urllib
import requests
# For image related tasks.
from PIL import Image
import tensorflow as tf

Download the dataset

The dataset is available in the form of .tar.gz files on the link provided above. The below code checks for whether the data is already downloaded or not. If the dataset is already available in the location provided by the dataset_path variable, it is not downloaded.

dataset_url = "https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
dataset_path = 'cifar_data/'
print('Checking if data exists...')if os.path.basename(dataset_url) in os.listdir(dataset_path):
print('Skipping data download. It already exists.')
else:
print('Downloading data...')
r = requests.get(dataset_url)
with open(f'{dataset_path}/{os.path.basename(dataset_url)}', 'wb') as file:
file.write(r.content)
print('Data downloaded!')

tarfile_path = f'{dataset_path}/{os.path.basename(dataset_url)}'

Extract the data files from the downloaded tar files and load the data

The tarfile contains the files data_batch_1, data_batch_2, …, data_batch_5, as well as test_batch. Each of these files is a Python “pickled” object produced with cPickle.

The extracted files will be stored in the directory: cifar_data/cifar-10-batches_py/

tfile = tarfile.open(tarfile_path)print('The tarfile has the following members.')
print(tfile.getnames())
print('Extracting all the members...')
tfile.extractall('cifar_data/')
print('Extraction complete.')

The unpickle() method was provided by the dataset source (Thank you, people 😃). Initially, I am going to load everything in Python dictionary. But, don’t worry, later, I am going to use Numpy arrays for the sake of efficiency and the wide range of operations they provide.

def unpickle(file):
import pickle
with open(file, 'rb') as fo:
dict = pickle.load(fo, encoding='bytes')
return dict
flag = True # A flag to track whether to create keys of append to existing keys.
dictionary = dict()
for batch in os.listdir('cifar_data/cifar-10-batches-py'):
if batch not in ['batches.meta', 'readme.html', 'test_batch']:

dictionary_raw = unpickle(f'cifar_data/cifar-10-batches-py/{batch}')
if flag: # Create keys.
# ^^ Reshape images to 32x32x3 format.
dictionary['labels'] = dictionary_raw[b'labels']
dictionary['data'] = dictionary_raw[b'data'].reshape(10000, 3, 32, 32).transpose(0, 2, 3, 1)
flag = not flag # Now, the data will be appended to the created keys.
else:
dictionary['labels'] += dictionary_raw[b'labels']
dictionary['data'] = np.concatenate((dictionary['data'], dictionary_raw[b'data'].reshape(10000, 3, 32, 32).transpose(0, 2, 3, 1)))
# Get the test data too
test_dictionary = dict()
dictionary_raw = unpickle('cifar_data/cifar-10-batches-py/test_batch')
test_dictionary['labels'] = dictionary_raw[b'labels']
test_dictionary['data'] = dictionary_raw[b'data'].reshape(10000, 3, 32, 32).transpose(0, 2, 3, 1) # Reshape images to 32x32x3 format.

Define some constants that will be required later.

train_size = 40000
valid_size = 10000
test_size = 10000
im_height = 32
im_width = 32
num_channels = 3
num_classes = 10
label_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

Check if everything is in order

To see if the data is loaded correctly, I am going to display 10 images chosen randomly and their corresponding labels.

f, axarr = plt.subplots(2, 5)for i in range(2):
for j in range(5):
index = np.random.randint(0, train_size)
axarr[i, j].imshow(dictionary['data'][index])
axarr[i, j].set_title(label_names[dictionary['labels'][index]])
"""Visually, we can determine that the loading of data into dictionaries has been done correctly."""

Executing the above code shows that the data is loaded correctly. Incorrect data and labelling will cause our CNN to be unable to learn the correct information.

We can see that the data was loaded correctly. I apologize for the low quality images but our image dataset is of 32x32 pixels.

The loaded images are correctly labeled. This shows that our data is loaded correctly with respect to the labels.

Convert to Numpy arrays and check distribution

Numpy arrays provide a lot of inbuilt functionality and efficiency that is not available with the Python dictionaries that I have been using. You will see later that Numpy makes our lives so much easier.

train_images = np.asarray(dictionary['data'])
train_labels = np.asarray(dictionary['labels'])
test_images = np.asarray(test_dictionary['data'])
test_labels = np.asarray(test_dictionary['labels'])

Let’s see how much our data is distributed among the output classes. This makes sure we are not suffering from the class imbalance problem.

count_dict = dict()for label in label_names:
count_dict[label] = 0

for label in train_labels:
count_dict[label_names[label]] += 1

print(count_dict)

On running the above code, I got the following output:

It is evident that our data is evenly distributed among the classes. This is good for the model.

Perform some cleanup

Though this is not required, I will be removing the non-useful objects from the memory. Given that our dataset is small, it should not be a problem in our case, but this might be useful in memory constrained environments. At the very least, I will be able to re-use variable names. 😆

import gcdel dictionary_raw, dictionary, test_dictionary, tfile, count_dict
gc.collect()

Some Preprocessing

The data needs to be normalized and the labels need to be one-hot encoded.
For normalization, I am going to use Min-Max Normalization technique to transform the data into the range of 0 -1 (inclusive).

Source
def normalize(x):

x = (x-np.min(x))/ (np.max(x) - np.min(x))

return x
train_images = normalize(train_images)
test_images = normalize(test_images)

One-Hot encoding transforms each label into a vector with the number of elements in a vector equal to the number of classes and only the element with the label index will be 1 and rest all elements will be 0.
In our case, we have 10 classes. Assuming that we have an image with label 6. The one-hot encoded label will be: [0,0,0,0,0,1,0,0,0,0]. The element with the 6th index is 1 and all others are 0.

def one_hot_encode(Y, C):
encoded = np.zeros((len(Y), C))

for idx, val in enumerate(Y):
encoded[idx][val] = 1

return encoded
train_labels = one_hot_encode(train_labels, num_classes)
test_labels = one_hot_encode(test_labels, num_classes)
train_images = train_images[:train_size, :]
train_labels = train_labels[:train_size, :]
print(f'The shape of training data: {train_images.shape}')
print(f'The shape of training labels: {train_labels.shape}')
print(f'The shape of test images: {test_images.shape}')
print(f'The shape of test labels: {test_labels.shape}')

The normalized and one-hot encoded data has the following shape:

The shape of our normalized and one-hot encoded data.

Split data into mini batches

Before moving to define our model, the data must be split into random mini batches. The following function shuffles the data and generates mini batches of the specified size.

def get_mini_batches(X, Y, mini_batch_size):
m = X.shape[0] # number of examples
mini_batches = []

# Shuffle X and Y
shuffles = list(np.random.permutation(m))
shuffled_X = X[shuffles, :, :, :]
shuffled_Y = Y[shuffles,:]

# Partition the shuffles batches but exclude the end case.
num_complete_minibatches = m//mini_batch_size
for k in range(num_complete_minibatches):
mini_batch_X = shuffled_X[k * mini_batch_size : k * mini_batch_size + mini_batch_size,:,:,:]
mini_batch_Y = shuffled_Y[k * mini_batch_size : k * mini_batch_size + mini_batch_size,:]
mini_batch = (mini_batch_X, mini_batch_Y)
mini_batches.append(mini_batch)

if m % mini_batch_size != 0:
mini_batch_X = shuffled_X[num_complete_minibatches * mini_batch_size : m,:,:,:]
mini_batch_Y = shuffled_Y[num_complete_minibatches * mini_batch_size : m,:]
mini_batch = (mini_batch_X, mini_batch_Y)
mini_batches.append(mini_batch)

return mini_batches

Model Definition

We have reached the good stuff now! But there is still one thing that needs to be done: DEFINE SOME HYPERPARAMETERS! 😆

epochs = 30
keep_probability = 0.7
learning_rate = 0.001
batch_size = 128
CNN architecture. I made it myself ❗️

Our CNN will consist of the following layers:
* Layer 1: A convolution layer with 32 filters of size 3x3
* Layer 2: A max pooling layer with filter size 2x2
* Layer 3: A convolution layer with 32 filters of size 3x3
* Layer 4: A max pooling layer with filter size 2x2
* Layer 5: A convolution layer with 32 filters of size 3x3
* Layer 6: A max pooling layer with filter size 2x2
* Layer 7: Flatten the input.
* Layer 8: Hidden layer with 128 units
* Layer 9: Output layer with 10 units.

Using the above architecture gave me very good results. Though there is scope for improvements. I am going to leave that to you. Do share your ideas in the comments.

X = tf.placeholder(tf.float32, shape=(None, im_height, im_width, num_channels), name='input_x')
Y = tf.placeholder(tf.float32, shape=(None, num_classes), name='output_y')
keep_prob = tf.placeholder(tf.float32, name='keep_prob') # Shape can be found automatically.
def model(x, keep_prob):
filter1 = tf.Variable(tf.truncated_normal(shape=[3, 3, 3, 32], mean=0, stddev=0.08))
filter2 = tf.Variable(tf.truncated_normal(shape=[3, 3, 32, 64], mean=0, stddev=0.08))
filter3 = tf.Variable(tf.truncated_normal(shape=[3, 3, 64, 128], mean=0, stddev=0.08))
conv1 = tf.nn.conv2d(x, filter1, strides=[1,1,1,1], padding='SAME')
pool1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1,2,2,1], padding='SAME')
pool1 = tf.nn.relu(pool1)

conv2 = tf.nn.conv2d(pool1, filter2, strides=[1, 1, 1, 1], padding='SAME')
pool2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
pool2 = tf.nn.relu(pool2)

conv3 = tf.nn.conv2d(pool2, filter2, strides=[1, 1, 1, 1], padding='SAME')
pool3 = tf.nn.max_pool(conv3, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
pool3 = tf.nn.relu(pool3)
flatten = tf.contrib.layers.flatten(pool3)# Hidden layers
hidden = tf.contrib.layers.fully_connected(flatten, num_outputs=128, activation_fn=tf.nn.relu)
hidden = tf.nn.dropout(hidden, keep_prob)
hidden = tf.layers.batch_normalization(hidden)
# Output
output = tf.contrib.layers.fully_connected(hidden, num_outputs=num_classes, activation_fn=None)


return output

First, we have to initialize placeholders for our data. Placeholders, as the name suggests, are just data objects which will hold any value that we will input. These placeholders are used to feed our data into Tensorflow computational graphs.
Next, we define filters. In Tensorflow, we have to define filters for the Convolutional layers using Tensorflow variables. Tensorflow variables are those values that are initialized with a Tensorflow session and their values can be changed during execution.
After that, we just define our model and everything is set.

WELL, NOT EVERYTHING!!!

Define our output, cost, optimizer

We also have to define our cost function and our optimizers for training the network.

logits = model(X, keep_prob)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=Y))

The cost function will be softmax because we have a multi-class classification problem. tf.reduce_mean computes the mean of elements across dimensions of a tensor.

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate,
beta1=0.9,
beta2=0.999,
epsilon=1e-08).minimize(cost)

AdamOptimizer is used for the optimization process. During the course, we are taught that Adam incorporates the benefits of Momentum and RMSProp with normal Gradient Descent. Well, I might not be using Adam to its full capacity and there may be scope for some hyper-parameter tuning, I will not go into that. Once again, your ideas are welcome. Please share in the comments.

correct_pred = tf.equal(tf.argmax(logits, axis = 1), tf.argmax(Y, axis = 1)) # See if the prediction index is same as the label index. 
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32), name='accuracy')

The correct_pred will take the maximum of each row and compare it with the maximum of each row of labels. Then, the mean of the correct_pred is calculated using tf.reduce_mean to get accuracy.

Train the model

# Some lists to keep track of the losses. Will be used to plot some graphs later.
valid_losses = []
valid_accuracy = []
train_losses = []
saver = tf.train.Saver()sess = tf.Session()# Initialize the variables
sess.run(tf.global_variables_initializer())
# Training
for epoch in range(epochs):
mini_batch_cost = 0
train_mini_batch_size = train_size // batch_size
test_mini_batch_size = test_size // batch_size

train_minibatches = get_mini_batches(train_images, train_labels, batch_size)
test_mini_batches = get_mini_batches(test_images, test_labels, batch_size)

for minibatch in train_minibatches:
(minibatch_X, minibatch_Y) = minibatch
_, temp_cost = sess.run([optimizer, cost], feed_dict = {X: minibatch_X, Y: minibatch_Y, keep_prob: keep_probability})
mini_batch_cost += temp_cost / train_mini_batch_size

temp_cost, acc = sess.run([cost, accuracy], feed_dict={X:test_images, Y:test_labels, keep_prob: 1})

valid_losses.append(temp_cost)
valid_accuracy.append(acc)
train_losses.append(mini_batch_cost)

if (epoch+1) % 3== 0:
print(f'Epoch: {epoch + 1}\t training_loss: {mini_batch_cost}\t validation_accuracy: {acc}\t validation_cost: {temp_cost}')

saver.save(sess, os.path.join(os.getcwd(),'models',"CNN.ckpt"))

During the training process, the training set is divided into random mini batches and each mini batch is fed into the network. I used the train set as the validation set.
An important thing to remember is that save your session before closing it. I was not able to test my model because I could not find a way to resume the session and hence, my model kept giving the wrong outputs. The session can be saved by: saver = tf.train.Saver()

Plot some graphs to analyze the model

I am going to plot some standard plots to see the model’s progress during the epochs.

# Plot the training losses.
plt.plot(train_losses, range(epochs), color='r')
plt.xlabel('Loss')
plt.ylabel('Epochs')
plt.title('Training Losses')
# Plot Validation lossesplt.plot(valid_losses, range(epochs), color='r')
plt.xlabel('Loss')
plt.ylabel('Epochs')
plt.title('Validation Losses')
# Plot the validation accuracy
plt.plot(valid_accuracy, range(epochs), color='r')
plt.xlabel('Accuracy')
plt.ylabel('Epochs')
plt.title('Validation Accuracy')

It is evident that the model started overfitting around epoch 25. This can be tuned too.

See the model performance on the test set

Even though the test set was used for validation and the accuracy is known, I am still going to see how it is performing on some random images from the test set.

The upper label is the model output and the lower label is the correct label.

The model gives acceptable performance. There is definitely some scope for improvement and maybe I will work on that later.

Thank You

So, after a very long article (I will work on the article length in future versions), I hope this was helpful. The experience of learning from the course, applying the concepts on a project by myself and writing this article was really great for me.

I might also work on the PyTorch implementation for the projects that I do in this series of articles. Stay tuned!!! 😃

--

--