Intuitively Create CNN for Fashion Image Multi-class Classification
In my previous article , I walked though how to build a Convolution Neural Network (CNN) for a binary image classification problem. In this article, I will create another CNN for retail marketing industry. What sets this article unique: different format of input data which requires different data processing methods, and different CNN architecture for multi-class classification. It is split into 6 parts.
- Problem statement
- Data processing
- Model building
- Model compiling
- Model fitting
- Model evaluation
- Problem statement
We are given a set of images from retail industry. The task is to create a CNN model to predict the label of a fashion image: 0 as T-shirt; 1 as Trouser; 2 as Pullover; 3 as Dress; 4 as Coat; 5 as Sandal; 6 as Shirt; 7 as Sneaker; 8 as Bag; 9 as Ankle boot.
The data we used is Fashion MINST dataset with 70, 000 images, of which 60,000 for training set, and 10,000 for test set. All images are in grayscale with 28 pixels in height and 28 pixels in width. Each pixel representing the darkness of the pixel ranges from 0 (black) to 255 (white).
Figure 1 is a snippet of the training data. Note, each row representing an image has an associated label and 784-pixel values.
First, read in training and test data and convert dataframe type to numpy array .
fashion_train_df = pd.read_csv(‘fashion-mnist_train.csv’,sep=’,’) fashion_test_df = pd.read_csv(‘fashion-mnist_test.csv’, sep = ‘,’) training = np.array(fashion_train_df, dtype = ‘float32’) testing = np.array(fashion_test_df, dtype=’float32')
If you want to view the image in color or grayscale mode, try below:
i = random.randint(1,60000) #select random index from 1 to 60,000 plt.imshow( training[i,1:].reshape((28,28)) ) # reshape and plot the image plt.imshow( training[i,1:].reshape((28,28)) , cmap = ‘gray’) # reshape and plot the image
Next, scale the independent variables, namely the pixels, between 0 and 1.
X_train = training[:,1:]/255 y_train = training[:,0] X_test = testing[:,1:]/255 y_test = testing[:,0]
Then, split the training data into training and validation sets, with validation taking 20%. With validation set, the model will be evaluated on its ability to generalize prediction on new data.
X_train, X_validate, y_train, y_validate = train_test_split(X_train, y_train, test_size = 0.2, random_state = 12345)
Finally, we need to reshape X_train , X_validate , X_test . This is a critical point. Keras only accepts a special shape of input data for CNN, namely (batch size, pixel width, pixel height, number of colour channels). Therefore,
X_train = X_train.reshape((-1, 28, 28, 1)) X_test = X_test.reshape(X_test.shape[0], *(28, 28, 1)) X_validate = X_validate.reshape(X_validate.shape[0], *(28, 28, 1))
Note, two methods are used to reshape the data above, achieving the same goal. 1st method sets the 1st dimension for Numpy to infer, while 2nd defines the 1st dimension with an *.
Great, now the data is ready to train the model.
In general, building a CNN requires 4 steps: convolution, max pooling, flattening and full connection. Here we will build a CNN model with 2 convolution layers.
Fundamentally, CNN is based on convolution. In simple words, convolutions use a kernel matrix to scan a given image and apply a filter to obtain a certain effect, such as blurring and sharpening. In CNN, kernels are used for feature extraction to select the most important pixels of an image and meanwhile preserves the spatial relationship between pixels.
If you want detailed explanation on the concept, please check the previous article here . Feel free to explore this fantastic website to visualize how convolution works. Another great website is by Ryerson University. It visually and interactively shows how a CNN works.
classifier = Sequential() classifier.add(Conv2D(64,3, 3, input_shape = (28,28,1), activation=’relu’))
Note, Number of feature detector is set to be 64, and the feature detector is a 3×3 array. input_shape is the shape of input images on which we apply feature detectors through convolution. We set it to be (28, 28, 1). Here, 1 is number of channel for a grayscale image, 28×28 is the image dimension in each channel. This needs to the same as the shape of X_train , X_test , X_validate .
Final argument is the activation function. we use ReLU to remove negative pixel values in feature maps. This is because depending on the parameters used in convolution, we may obtain negative pixels in feature maps. Removing negative pixels add non-linearity for a non-linear classification problem.
Max pooling is to reduce size of a feature map produced by convolution by sliding a table and taking the maximum value in the table. Ultimately, it aims to reduce the number of nodes in the fully connected layers without losing key features and spatial structure information in the images.
Specifically, we use MaxPooling2D() function to add the pooling layer. In general, we use a 2×2 table for pooling.
classifier.add(MaxPooling2D(pool_size = (2, 2)))
Dropout is the solution for over-fitting. How does drop out work? During each training iteration, some neurons are randomly disabled to prevent them from depending on each other too much. By overwriting these neurons, neural network retains a different architecture each time, helping neural network learn independent correlations of the data. This prevent the neurons over-learn. Specifically,
classifier.add(Dropout(0.25))
Note, we set 25% of neurons to disabled at each iteration.
3.4 Convolution & Max Pooling
Based on previous experiments, add a 2nd layer for convolution and max pooling to improve model performance.
classifier.add(Conv2D(32,3, 3, activation=’relu’)) classifier.add(MaxPooling2D(pool_size = (2, 2)))
Flattening is to take all reduced feature maps after pooling into a single vector as the input for the fully connected layers. Specifically,
classifier.add(Flatten())
With above, we converted an input image into a one-dimensional vector. Now let’s build a classifier using this vector as the input. Specifically,
classifier.add(Dense(output_dim = 32, activation = ‘relu’)) classifier.add(Dense(output_dim = 10, activation = ‘sigmoid’))
Note, for the 1st hidden layer, output_dim as the number of nodes in the hidden layer, is set to be 32. Please feel free to try more. Use ReLU as activation function.
With that done, congratulation for finishing the model building. Figure 2 is what we built.
With all layers added, let’s configure CNN for training. An important decision to make is the loss function. As advice, if one sample can have multiple classes or labels, use categorical_crossentropy . If classes are mutually exclusive (e.g. when each sample belongs exactly to one class), use sparse_categorical_crossentropy . Here use the latter.
classifier.compile(loss =’sparse_categorical_crossentropy’, optimizer=Adam(lr=0.001), metrics =[‘accuracy’])
Now the model is ready to be trained. We train the model for 50 iterations on the data. The model updates its gradients every 512 samples. Use ( X_validate , y_validate ) to evaluate the model loss and accuracy.
epochs = 50 history = classifier.fit(X_train, y_train, batch_size = 512, nb_epoch = epochs, verbose = 1, validation_data = (X_validate, y_validate))
At end, we obtained a training accuracy of 92% and test accuracy of 90% . Quite good results!
Now, let’s evaluate the model on test sets. Specifically,
evaluation = classifier.evaluate(X_test, y_test)
We obtained a test accuracy of 90% ! Figure 3 below shows a view of predicted and Real class of the images.
Finally, if you want tune the model with much more data, feel free to explore this link . If you want to check more advanced Data Science Innovation in Retail industry, check this page .
Great! Huge congratulation to the end. Hopefully, this gives a sense of how to create a CNN for fashion image classification. If you need the source code, feel free to visit my Github page. Many thanks for your time!
以上所述就是小编给大家介绍的《Intuitively Create CNN for Fashion Image Multi-class Classification》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
GUI设计禁忌2.0
Jeff Johnson / 盛海艳 等 / 机械工业出版社 / 2008 / 49.00元
本书描述软件开发人员在设计图形用户界面(GUI)时经常犯的“禁忌”,并提出避免这些错误的基本原则和理论依据。本书将GUI禁忌分为7种类型:GUI控件禁忌、导航禁忌、文字禁忌、图形设计和布局禁忌、交互禁忌、响应性禁忌以及管理禁忌,并分别进行详述。 本书编排独特,条理清晰,针对性极强,是不可多得的GUI设计优秀资源。本书适合软件开发人员、web站点设计人员、开发经理、用户界面设计人员等阅读。一起来看看 《GUI设计禁忌2.0》 这本书的介绍吧!