Build your own deep learning classification model in Keras
An intuitive guide to building your own deep learning classification model from scratch
Introduction
I mage classification is a field of artificial intelligence that is gaining in popularity in the latest years. It has various applications: self-driving cars, face recognition, and augmented reality.
In this article, we will use a step by step approach to build a deep learning image classification model.
I have made the full code available in a shared google collab so you can easily execute the code yourself!
After reading this guide, you will know the following things:
- How to make use of the free GPU power of google collab
- How to load in a popular image classification dataset (Pascal VOC)
- How to create a deep learning convolutional neural network using a combination of Keras & Tensorflow
- How to implement a datagenerator
- How to train a deep learning model & evaluate the results
Step #1: Set up the environment
Please check the google collab for the required packages.
We will be building a deep learning convolutional network from scratch. This requires huge amounts of computing power.
Luckily Google comes to our rescue! They have developed an online python notebook which gives users free computing power.
We enable the free computing power feature by select the GPU option in the notebook settings.
Step #2: Import the data
We will use the Pascal VOC image dataset for our deep learning model.
The pascal voc dataset is a standardised image dataset for objects class recognition and is widely used by computer vision professionals to showcase their skills.
We use the Wget package to download the dataset. This package fetches the data and downloads it to your current working directory.
As a last step, we open the tarfile and extract it.
Good job! You have now successfully loaded in and extracted the dataset.
import tarfile!wget -nc http://host.robots.ox.ac.uk/pascal/VOC/voc2009/VOCtrainval_11-May-2009.tartf = tarfile.open("/content/VOCtrainval_11-May-2009.tar")tf.extractall()
Step #3: Load the data
The current data structure is not optimal for building deep learning convolutional models.
We will have to transform the data in a more optimized format.
The extracted Pascal VOC dataset should have the two following folders:
- Annotations: This folder contains all the information about the image labels.
- JPEGImages: This folder contains all the raw images
We will first create a dataset with all the filenames and their respective labels. E.g. filename “2208–001068” has the following labels “bicycle” & “sofa”.
directory_annotations = '/content/VOCdevkit/VOC2009/Annotations'filenames = [] classification = []for xml_file in os.listdir(directory_annotations): # Save image for classification and their class label if os.path.isfile(xml_file): xml_tree = ET.parse(xml_file) root = xml_tree.getroot() imgname = root.find('filename').text.strip('.jpg') labels = [] for obj in root.findall('object'): label = obj.find('name').text labels.append(label) filenames.append(imgname) classification.append(labels)
Step #4: Preprocess
In this step, we will perform the following tasks:
- We split the up the filenames and their respective classification in a training and test set.
label_filenames_temp = os.listdir(directory_annotations) filenames = []for lbl in label_filenames_temp: filenames.append(lbl.split('.')[0])filecount = len(filenames)indexes = []for index in range(filecount): indexes.append(index)training_indexes = indexes[:int(filecount*0.7)] validation_indexes = indexes[int(filecount*0.7):int(filecount*0.9)] testing_indexes = indexes[int(filecount*0.9):]
- We convert these labels to numeric values since deep learning networks require the input and output variables to be numbers.
directory_images = '/content/VOCdevkit/VOC2009/JPEGImages'directory_annotations = '/content/VOCdevkit/VOC2009/Annotations'labelnames = preprocessing.LabelEncoder()labelnames.fit(["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"])
- We resize the images when loaded in to the 224,224,3 format. Literature review advises this for the VGG16 model. (Simonyan & Zisserman, 2014)
def generate_from_xml(filename):label = np.zeros((20), dtype = 'float32')tree = ET.parse(os.path.join(directory_annotations, filename + ".xml"))raw_image = cv2.imread(os.path.join(directory_images, filename + ".jpg"))res_img = cv2.resize(raw_image, (224,224)) for elems in tree.iter(): if elems.tag == "object": name = elems.find("name").text labelnr = labelnames.transform([name])[0] label[labelnr] = 1return label, res_img
Step #5: Datagenerator
If we would run our model on the dataset without using a datagenerator, our ram memory will crash. It is best practice to use a datagenerator when using big datasets (opposing to buying more ram memory). We create our datagenerator class instance and call it twice, one for the training set and once for the validation set.
Step #6: Create our model
In this task we will build a classification convolutional neural network from scratch and train it to recognize the 20 target classes in the Pascal Voc dataset.
Our Model architecture will be based on the popular VGG-16 architecture. This is a CNN with a total of 13 convolutional layers (cfr. figure 1).
We opt for the sequential approach of building the model.
model = Sequential()
We add 2 convolutional layers.
In the convolutional layers, multiple filters are applied to the image to extract different features.
Arguments given:
- Input-shape: The image given should be of the shape (224,224,3).
- Filters: The number of filters that the convolutional layer will learn.
- Kernel_size: specifies the width and height of the 2D convolution window.
- Padding: Specifying “same” ensures that the spatial dimensions are the same after the convolution.
- Activation: This is more of a convenience argument. Here, we specify which activation function will be applied after the convolutional layers. We will apply the ReLU activation function. More on this later.
model.add(Conv2D(input_shape=(224,224,3),filters=64,kernel_size=(3,3),padding="same", activation="relu"))model.add(Conv2D(filters=64,kernel_size=(3,3),padding="same", activation="relu"))
Next, we add 1 maxpool layer.
Pooling is used to reduce the dimensionality of images by reducing the number of pixels in the output of the previous convolutional layer.
- Pool_size= 2,2 -> this is the ‘matrix’ that will go over the output and where the maximum value is taken from
- strides= 2,2 -> the increment of how the pool matrix will move along x & y -axis.
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
We continue to add layers to our deep learning network. The same logic as described above is applied.
model.add(Conv2D(filters=128, kernel_size=(3,3),padding="same", activation="relu"))model.add(Conv2D(filters=128, kernel_size=(3,3),padding="same", activation="relu"))model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))model.add(Conv2D(filters=256, kernel_size=(3,3),padding="same", activation="relu"))model.add(Conv2D(filters=256, kernel_size=(3,3),padding="same", activation="relu"))model.add(Conv2D(filters=256, kernel_size=(3,3),padding="same", activation="relu"))model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))model.add(Conv2D(filters=512, kernel_size=(3,3),padding="same", activation="relu"))model.add(Conv2D(filters=512, kernel_size=(3,3),padding="same", activation="relu"))model.add(Conv2D(filters=512, kernel_size=(3,3),padding="same", activation="relu"))model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))model.add(Conv2D(filters=512, kernel_size=(3,3),padding="same", activation="relu"))model.add(Conv2D(filters=512, kernel_size=(3,3),padding="same", activation="relu"))model.add(Conv2D(filters=512, kernel_size=(3,3),padding="same", activation="relu"))model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
Now the convolutional base is created. To be able to generate a prediction, we will now have to flatten the output of the convolutional base.
model.add(Flatten())
Add the dense layers. The dense layers feeds the output of the convolutional base to its neurons.
Arguments:
- Units: Number of neurons
- Activation function: Relu
The Relu activation function speeds up training since the gradient computation is very simple (0 or 1). This also implies that negative values are not passed or “activated” on to the next layer. This makes that only a certain number of neurons are activated which makes it computationally interesting.
model.add(Dense(units=4096,activation="relu"))model.add(Dense(units=4096,activation="relu"))
We add a sigmoid layer in order to turn the output of the previous layer into a probability distribution. The sigmoid is ideal for multi label classification so that is why we used sigmoid instead of for example a softmax activation.
The probabilities produced by a sigmoid are independent and are not constrained to sum to one. This is crucial in a classification with multiple output labels.
We set the units argument to 20 since we have 20 possible classes.
model.add(Dense(units=20, activation="sigmoid"))
Step #7: Loss function & optimizer
As a final step, we have to compile the model. We use the RMSprop optimier to be able to reach the global minima. We set the learning rate at 0.001.
RMSprop, root mean square prop, is an unpublished optimization algorithm but is very popular by machine learning practitioners. It reduces the fluctuations in the vertical direction while speeding up the learning in the horizontal direction. This causes our model to converge faster to a global minima. The main difference with the regular gradient descent algorithm is how the gradients are calculated. The formula of the calculation of the gradients is shown in the figure below.
We opted for the binary cross-entropy loss. It is recommended to use this loss function for a multi-label classification since each element belonging to a certain class should not be influenced by the decision for another class.
model.compile(optimizer= keras.optimizers.RMSprop(lr=0.001), loss='binary_crossentropy',metrics=['accuracy'])model.summary()
Step #8: Model training
We use the earlystopping method to stop the training once the model performance stops improving on a hold out dataset. In this way we automatically have the perfect number of epochs while monitoring overfitting.
We give the earlystopping the instructions to seek a minimum for the validation loss.
The earlystopping method only stops training when no further improvement is detected.
However, the last epoch is not necessarily the one with the best performance.
Therefore we also use the model checkpoint method. This will save the best model observed during the training based on the validation loss.
filepath = "/content/drive/My Drive/MYCNN/CNN1505_v1.h5"earlyStopping = EarlyStopping(monitor='val_loss', verbose=0, mode='min', patience = 4)mcp_save = ModelCheckpoint(filepath, save_best_only=True, monitor='val_loss', mode='min')
Now we will start training our deep learning neural network. We use the fit generator from keras to load in the data in batches. This is necessary since our entire training set doesn’t fit in our RAM.
We set the following arguments:
- Use multiprocessing: Whether to use process-based threading
- Workers: Number of threads generating batches in parallel.
history = model.fit_generator(generator=training_generator, validation_data=val_generator, use_multiprocessing=True, workers=6, epochs = 20, callbacks=[earlyStopping, mcp_save])
When our training has finished, we visualize our training and validation results. Two metrics are plotted:
- Model accuracy
- Model loss
Step #9: Validate our model
We see that the model quickly converged from a huge training loss in the first epoch to lower numbers. This fast learning rate is due to the nature of the optimizer chosen (RMS prop) which speeds up convergence. Our model picks then the model with the lowest validation loss when this metric has not been improved over four epochs.
df = pd.DataFrame(history.history) print(history.history.keys())# summarize history for accuracyplt.plot(history.history['accuracy'])plt.plot(history.history['val_accuracy'])plt.title('model accuracy')plt.ylabel('accuracy')plt.xlabel('epoch')plt.legend(['train', 'test'], loc='upper left')plt.show()# summarize history for lossplt.plot(history.history['loss'])plt.plot(history.history['val_loss'])plt.title('model loss')plt.ylabel('loss')plt.xlabel('epoch')plt.legend(['train', 'test'], loc='upper left')plt.show()
Step #10: Test our model performance
We now test our model on the test set to see how it performs on unseen data:
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。