内容简介:Through this competition, Severstal expects the AI community to improve the algorithm byOur task is to
Severstal is among the top 50 producers of steel in the world and Russia’s biggest player in efficient steel mining and production. One of the key products of Severstal is steel sheets. The production process of flat sheet steel is delicate. From heating and rolling, to drying and cutting, several machines touch flat steel by the time it’s ready to ship. To ensure quality in the production of steel sheets, today, Severstal uses images from high-frequency cameras to power a defect detection algorithm.
Through this competition, Severstal expects the AI community to improve the algorithm by localizing and classifying surface defects on a steel sheet .
Business objectives and constraints
- A defective sheet must be predicted as defective since there would be serious concerns about quality if we misclassify a defective sheet as non-defective. i.e. high recall value for each of the classes is needed.
- We need not give the results for a given image in the blink of an eye. (No strict latency concerns)
2. Machine Learning Problem
2.1. Mapping the business problem to an ML problem
Our task is to
- Detect/localize the defects in a steel sheet using image segmentation and
- Classify the detected defects into one or more classes from [1, 2, 3, 4]
To put it together, it is a semantic image segmentation problem.
2.2. Performance metric
The evaluation metric used is the mean Dice coefficient. The Dice coefficient can be used to compare the pixel-wise agreement between a predicted segmentation and its corresponding ground truth. The formula is given by:
where X is the predicted set of pixels and Y is the ground truth.
Read more about Dice Coefficient here .
2.3. Data Overview
We have been given a zip folder of size 2GB which contains the following:
-
train_images
—a folder containing 12,568 training images (.jpg files) -
test_images
— a folder containing 5506 test images (.jpg files). We need to detect and localize defects in these images -
train.csv
— training annotations which provide segments for defects belonging to ClassId = [1, 2, 3, 4] -
sample_submission.csv
— a sample submission file in the correct format, with each ImageId repeated 4 times, one for each of the 4 defect classes.
More details about data have been discussed in the next section.
3. Exploratory Data Analysis
The first step in solving any machine learning problem should be a thorough study of the raw data. This gives a fair idea about what our approaches to solving the problem should be. Very often, it also helps us find some latent aspects of the data which might be useful to our models.
Let’s analyse the data and try to draw some meaningful conclusions.
3.1. Loading train.csv file
train.csv tells which type of defect is present at what pixel location in an image. It contains the following columns:
-
ImageId
: image file name with .jpg extension -
ClassId
: type/class of the defect, one of [1, 2, 3, 4] -
EncodedPixels
: represents the range of defective pixels in an image in the form of run-length encoded pixels(pixel number where defect starts <space> pixel length of the defect).
e.g. ‘29102 12’ implies the defect is starting at pixel 29102 and running a total of 12 pixels, i.e. pixels 29102, 29103,………, 29113 are defective. The pixels are numbered from top to bottom, then left to right: 1 corresponds to pixel (1,1), 2 corresponds to (2,1), and so on.
train_df.ImageId.describe()count 7095
unique 6666
top ef24da2ba.jpg
freq 3
Name: ImageId, dtype: object
- There are 7095 data points corresponding to 6666 steel sheet images containing defects.
3.2. Analysing train_images & test_images folders
Number of train and test images
Let’s get some idea about the proportion of train and test images and check how many train images contain defects.Number of train images : 12568 Number of test images : 5506 Number of non-defective images in the train_images folder: 5902
- There are more images in the train_images folder than unique image Ids in train.csv . This means, not all the images in the train_images folder have at least one of the defects 1, 2, 3, 4.
Sizes of train and test images
Let’s check if all images in train and test are of the same size. If not, we must make them of the same size.{(256, 1600, 3)} {(256, 1600, 3)}
- All images in train and test folders have the same size (256 x 1600 x 3)
3.3. Analysis of labels: ClassId
Let’s see how train data is distributed among various classes.
Number of images in class 1 : 5150 (77.258 %)
Number of images in class 2 : 897 (13.456 %)
Number of images in class 3 : 801 (12.016 %)
Number of images in class 4 : 247 (3.705 %)
- The dataset looks imbalanced.
- The number of images with class 3 defect is very high compared to that of other classes. 77% of the defective images have class 3 defects.
- Class 2 is the least occurring class, only 3.7 % of images in train.csv belong to class 2.
Note that the Sum of percentage values in the above analysis is more than 100, which means some images have defects belonging to more than one class.
Number of labels tagged per image
Number of images having 1 class label(s): 6239 (93.594%)
Number of images having 2 class label(s): 425 (6.376%)
Number of images having 3 class label(s): 2 (0.03%)
- The majority of the images (93.6%) have only one class of defects.
- Only 2 images (0.03%) have a combination of 3 classes of defects.
- The rest of the images (6.37%) have a combination of 2 classes of defects.
- No image has all 4 classes of defects.
4. Data Preparation
Before we move ahead to training deep learning models, we need to convert the raw data into a form that can be fed to the models. Also, we need to build a data pipeline, which would perform the required pre-processing and generate batches of input and output images for training.
As the first step, we create a pandas dataframe containing filenames of train images under the column ImageId
, and
EncodedPixels
under one or more of the columns Defect_1
, Defect_2, Defect_3, Defect_4
depending on the
ClassId
of the image in train.csv.
The images that do not have any defects have all these 4 columns blank. Below is a sample of the dataframe:
4.1. Train, CV split 85:15
I would train my models on 85% of train images and validate on 15%.
(10682, 5) (1886, 5)
4.2. Utility Functions for converting RLE encoded pixels to masks and vice-versa
Let’s visualize some images from each class along with their masks. The pixels belonging to the defective area in the steel sheet image are indicated by yellow color in the mask image.
Our deep learning model would take steel sheet image as input (X) and return four masks (Y)(corresponding to 4 classes) as output. This implies, for training our model we would need to feed batches of train images and their corresponding masks to the model.
We generate masks for all the images in the train_images folder and store them into a folder called train_masks.
4.3. Data generator using tensorflow.data
The below code is data pipeline for applying pre-processing, augmentation to input images and generating batches for training.
4.4. Defining metric and loss function
I have used a hybrid loss function which is a combination of binary cross-entropy (BCE) and dice loss . BCE corresponds to binary classification of each pixel (0 indicating false prediction of defect at that pixel when compared to the ground truth mask and 1 indicating correct prediction). Dice loss is given by (1- dice coefficient).
BCE dice loss = BCE + dice loss
5. Models
There are several models/architectures that are used for semantic image segmentation. I have tried two of them in this case study: i)U-Net and ii) Google’s DeepLabV3+.
5.1. First cut Solution: U-Net for Semantic Image Segmentation
This model is based on the research paper U-Net: Convolutional Networks for Biomedical Image Segmentation , published in 2015 by Olaf Ronneberger , Philipp Fischer , and Thomas Brox of University of Freiburg, Germany . In this paper, the authors build upon an elegant architecture, called “ Fully Convolutional Network ”. They have used this for segmentation of neuronal structures in electron microscopic stacks and few other biomedical image segmentation datasets.
5.1.1. Architecture
The Architecture of the network is shown in the image below. It consists of a contracting path (left side) and an expansive path (right side). The expanding path is symmetric to the contracting path giving the network a shape resembling the English letter ‘U’. Due to this reason, the network is called U-Net.
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。