Is CNN equally shiny on mid-resolution satellite data?

栏目: IT技术 · 发布时间: 4年前

Variants of Convolution Neural Network (CNN) continue to be hailed as powerful Machine Learning (ML) algorithms for image-related problems. CNN bagged unprecedented accuracy in a variety of fields — object-based satellite image classification is one such application that proliferated in recent times. While high-resolution satellite image, required for object-based classification is not available for free, researchers often rely on freely available mid-resolution data (e.g. Landsat — each pixel represents a 30m*30m land parcel). The mechanism of CNN elucidates that it considers the neighbouring pixels and relies on the pattern and texture of it, and not just one pixel at a time.

In the land cover classification of a mid-resolution satellite data (e.g. Landsat), the objective is to classify each pixel, based on its digital number (DN) values across different bands. When looking at it from the perspective of a CNN model, the obvious question that arises is, “will the DN values of immediate neighbours or neighbouring pixels located a few pixels away play any role in determining the class of a pixel?” The majority may answer ‘NO’. Does it mean that CNN despite being a powerful tool for image conundrums, would fail on such an image? That’s too quick to conclude something based solely on assumption. In this post, we will investigate the usability of the CNN model on mid-resolution data, where object-identification is neither possible nor the goal.

This post is highly recommended and good-fit for people in the geospatial field who want to kickstart their “ CNN for remote sensing” journey.

Knowledge is not a destination, it is the journey. So don’t scroll down just to search for a script. :eyes:

Data used

Bands 2 to 6 of Landsat 5 multispectral data and its corresponding binary built-up layer for the year 2011 across Bangalore is used here. Landsat 8 or Sentinel-2A would be the obvious choices for many of us because they are the recent ones and will be continued in future. But the reason for not selecting them is higher radiometric and spectral resolution; which would produce better results than Landsat 5. If we manage to get good results on Landsat 5 data (8-bit pixel depth), we can scale it up to Landsat 8 or Sentinel-2A (both have 16-bit data) with minor modifications; but the reverse may not turn out very well.

For those who want a quick ML capsule (supervised) before getting started, it is establishing the relationship between a few characteristics (features or Xs) of an entity with its other property (value or label or Y) — we provide plenty of examples (labelled data) to the model so that it learns from it and then predicts labels for the new data (unlabelled data). Here, the multispectral data, by convention, will be referred to as features and classified built-up data as labels.

Pre-requisites

We will use Python 3.7.4 and the following libraries for modelling:

pyrsgis 0.3.1 — to read and write GeoTIFF
Scikit-learn 0.22.1 — for data pre-processing and accuracy checks
Numpy 1.17.2 — for basic array operations
Tensorflow 2.0.0 — to build and deploy the CNN model

Understanding the data

The distribution of data plays an important role in selecting a model for a specific purpose. The graph below shows the frequency of DN values across all bands.

Histogram of the Landsat 5 multispectral

The histogram shows an uneven distribution, models like neural networks are sensitive to this type of data distribution because they naturally tend to give more importance to features with higher values. For e.g. Band 6 seems to have a relatively large number of pixels with high DN values (high mean). Dealing with this is only worth if the model is performing poorly or if we want to give the last push to the accuracy of the model, for now, we are bypassing any alteration to the data to focus on the CNN part.

Part A: Reading and storing image chips as Numpy arrays

Generating training samples for training the model

The Produce Training Data for Deep Learning QGIS plugin will help us produce training samples and visualise it better before heading for the model training. Usually, this is done in the back-end, but visualising the data and its structure is always useful, especially for beginners. Following are the set of parameters that I used in this iteration (we can always come back and tweak these if required):

This step can take a couple of minutes depending on the parameters you have passed and your computational power. I’ve used a 7x7 window with 7x7 stride (window slide) to generate the training samples which resulted in 84,972 image chips, good enough for training an ML model, but we will reduce this number in the next steps. Feel free to produce more or fewer images by reducing or increasing the stride. Read more about the plugin usage here .

TIME TO CODE NOW!

The code snippet below performs the following in sequence:

change the working directory to the location of the generated image chips
read image chips in a loop
stack chips in a four-dimensional NumPy array
display some basic info about the data.

Input features shape: (84972, 6, 7, 7)
Input labels shape: (84972,)
Values in input features, min: 1 & max: 255

The shape of the features shows that the data is stacked as a four-dimensional array, where the first index is the position of a record, second is the number of bands of the image, and third and fourth are the height and columns of the image chips respectively.

Ideally, the number of bands should be at the last position as shown in the image below, the way Tensorflow expects the image chips. We will get down to fixing this slightly later.

Saving NumPy arrays as files for quick access

Looping through the files can be time-consuming depending upon your machine’s power. To avoid this every time you run the script, a better way is to store the NumPy arrays on disk (.npy format). The following lines will do the job:

Part B. Data preprocessing and training the model

To run the script from second time onwards, you can start by reading the .npy files. Now, a machine learning model expects examples of all classes (here two classes) in the same quantity. If the number of training data in each class differs dramatically, training the model is difficult (but not impossible). Let’s look at our data.

Number of records in each class:
Built: 10082, Unbuilt: 74890

Ok! That doesn’t look appealing. Let us say we calculate the accuracy of the model at a later stage by simply calculating the proportion of built or unbuilt predictions. If the model gets poorly trained and predicts everything as unbuilt, it can still possess a misleading accuracy of 88% (100 * correctly classified unbuilt/total records). This is because the number of training samples in the unbuilt class is in the extreme majority. Fixing this is mandatory!

The code snippet below will reduce the number of training samples in the unbuilt class to match with that of the built class by randomly picking up samples.

Number of records in balanced classes:
Built: 10082, Unbuilt: 10082

The total number of training samples have reduced significantly but is still better than having extremely unbalanced classes.

Data normalisation

Scaling the data is important to make sure that all the features are treated equally since neural networks are sensitive to the distribution of data as seen in the first plot. The data can either be in the 0 to 1 range (normalised) or -1 to 1 range (standardised). We will normalise the data by assuming that the maximum and minimum values across all six bands are zero and 255. The code snippet below will merge the separated features from the previous step and normalise it.

New values in input features, min: 0 & max: 1

It is always a good practice to calculate the minimum and maximum values from the data itself. But for satellite data classification, what if we want the model to predict built-up area for some other region where the minimum and maximum values of the features are significantly different from the one we are using for training? That is an entirely different debate which we won't be getting into and proceed with the conventional assumption of minimum and maximum values for 8-bit data.

Test/train split

To be able to evaluate the performance of the model at a later stage, the data by convention is split into two halves, training and testing. We will define a function for this, the train-test proportion here is 60–40.

Creating tensors

As we saw in the beginning, the number of bands in our features data is at the second index. To be able to build a model using Tensorflow, we need to convert the data in the channels at last index format. Tensorflow has a transpose function for this, as shown below:

Reshaped split features: (12098, 7, 7, 6) (8066, 7, 7, 6)
Split labels: (12098,) (8066,)

The keys to a winning model are feature engineering and building the right model architecture. Feature engineering is done to extract the most meaningful information or to enhance the content of the data, but that is a fairly large topic for a separate post in future. In aprevious post, a simple neural network model performed satisfactorily for built-up extraction without any feature engineering (might have managed to establish a relationship between bands and the built-up class). Therefore, it may not be mandatory to engineer input features always, it is subject to the complexity of the problem and the model. Here we are getting away with ‘feature engineering’ by simply scaling the data (as done in a previous step).

The architecture of the model takes weeks for the making, hence, “ your first model will never make it to predictions” . I started with only the input and output layer to check the base accuracy. I kept on adding the layers and changing the number of convolutions in each layer to reach somewhere near the desired results. Here is what I grounded at:

The code snippet below will build the model and train it.

Unlike the majority of deep learning models, we are not using a pooling layer, which focuses on finding objects in the image. Feel free to read the details of each of the parameters of the model on the official website, quite well documented .

It is very common with neural networks that they memorise the training data, and fail to establish a relation, which results in poor prediction on new data, popularly known as over-fitting. Therefore, post-training, we cross-check the accuracy of the model using the test data.

Confusion matrix:
[[3781, 269],
[252, 3764]]P-Score: 0.933, R-Score: 0.937, F-Score: 0.935

The accuracy that we achieved seems impressive, for a different image, you would have to spend some time in reaching the right model architecture. To conveniently do that, you may want to add a few lines to save the model on your disk (HDF5 format) using the below code snippet.

Part C. Loading the saved model to predict new results

Now, let us see how the model behaves on new data. The code snippet below will:

load the saved model,
generate image chips in the memory, and
repeat all the pre-processing steps for prediction.

The CNNdataGenerator function defined below can be used to bypass the QGIS plugin step we adopted at the beginning for image chips generation.

The image below shows the predicted built-up (red in colour) for a new area — Hyderabad, India.

Overall, the classification looks fine, but for the mixed built-up pixels — with low confidence values. Checking the results in QGIS revealed that the model was less confident on the mixed built-up pixels, probably because it did not see enough examples during training. You can attempt to solve this by reducing the stride — may blow up the number of training features proportionally. The other option is to use a slightly lower threshold to extract all the built-up pixels, and it should just work fine.

About the extremely high accuracy of the model we achieved, it was too, a consequence of less number of mixed built-up pixels in our testing set. The model performed well in predicting test classes because they were easily differentiable in the multidimensional space. Though, the model learnt well to distinguish between classes for pure pixels and seems to be working well on new data.

The argument that we started with was the size of the kernel for mid-resolution data, to investigate that, I used two more kernel sizes (3 and 11).

An important point worth mentioning is that deep learning architectures are sensitive, that is, we can not expect the exact same model architecture to produce similar accuracy on different kernel sizes. Therefore, a few minor modifications in the architecture are expected. Due to the simplicity of the current problem, the same model architecture produced convincing results on different kernel sizes.

Accuracy achieved using different kernel sizes

Now that we have the best results (for a basic CNN at least) from different kernel size and the output of an ANN model from theprevious post, let us predict new data using all the models and visualise how it looks like.

To me, it seems like (and hopefully you can also notice at this scale) the larger the kernel is, the smoother the predicted image is. The output from the larger kernels looks like the classified image of a 100m or 200m resolution data, which defies the resolution of the original image.

Among CNN models, the 3 by 3 kernel would be the best pick. In general, it looks like the ANN model is retaining the maximum resolution, but the precision and recall for ANN were lower (0.815 & 0.838) than the present CNN models; probably due to imbalanced training classes and larger data size. Retaining the original resolution (only visually) does not simply conclude that ANN outperforms CNN for mid-resolution data.

The CNN model that we constructed is a very basic version, and we only tweaked the number of layers and the number of convolution filters in each layer. However, there are many other, and much more crucial parameters (stride, activation functions, pooling, dropouts, epochs, batch size, learning rate, etc.) that can be played with to address the aforementioned issues — have ignored them here to avoid prolonging the post.

Stay tuned for similar posts and learn about some of the interesting geospatial stuff that I surround myself with. Twitter , LinkedIn

The full script, data, and the trained models can be found on this GitHub repository . Hail open knowledge! :mortar_board::japanese_goblin:

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Is CNN equally shiny on mid-resolution satellite data?

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

设计模式

[美] Erich Gamma、Richard Helm、Ralph Johnson、John Vlissides / 李英军、马晓星、蔡敏、刘建中等 / 机械工业出版社 / 2000-9 / 35.00元

这本书结合设计实作例从面向对象的设计中精选出23个设计模式，总结了面向对象设计中最有价值的经验，并且用简洁可复用的形式表达出来。书中分类描述了一组设计良好、表达清楚的软件设计模式，这些模式在实用环境下特别有用。此书适合大学计算机专业的学生、研究生及相关人员参考。书中涉及的设计模式并不描述新的或未经证实的设计，只收录了那些在不同系统中多次使用过的成功设计。一起来看看《设计模式》这本书的介绍吧!

码农工具