Metastatic Cancer Detection in Histopathological Image Scans
Using Machine Learning to predict if Cancer is within Microscopic Images.
Feb 29 ·7min read
The Terrible Tale of Cancer
Cancer continues to be one of the world’s most deadly diseases, killing over 10 million+ people a year. One of the reasons that cancer is so deadly is that if it’s left unattended even for a short period of time, it can have already taken over a biological system. Early diagnoses are the key to battling cancer, and machine learning is revolutionizing early diagnosis.
Machine learning is able to deduce patterns in things that humans can’t. This includes finding very abstract patterns in images. Using the power of machine learning, a model can be created to deduce cancerous cells from non-cancerous cells in image scans. Convolutional Neural Networks can also help with this problem, as they are perfect for finding relationships in spatial sensitive data like image scans.
Metastatic Cancer
Cancer usually starts in a primary place in the body and then spreads viciously. Cancer that results from the primary place in the body is metastatic cancer. It’s essentially secondary cancer that is a direct result of primary cancer. So why are we looking for secondary cancer and not the root cause? Because it’s much easier to deduce secondary cancer when scanning the whole body and when someone has cancer, metastatic cancer is usually throughout their body in small quantities. Finding metastatic cancer is like an indicator to determine if someone has a crippling case of cancer somewhere else in their body.
Specifically, metastatic cancer can be found in different organs in the body. We can examine histopathological image scans , which are microscopic images of organs and cells within the body. Histopathological image scans have a great detail of different cells within what portion of the body one is looking at and can examine different diseases. However, it’s hard to tell in these scans if cancer is present or not since we aren’t all experts within the field of cancer research. Machine learning can be used to solve this issue, to tell if metastatic cancer is present in histopathological image scans.
Machine Learning + Convolutional Neural Networks
Machine learning is all about getting machines to learn how to perform a specific task. Instead of giving explicit instructions, a machine learns to create its own instructions . In reality, all these instructions are mathematical values. A machine learning model is just a differentiable function , that takes in a predefined amount of input, uses variables (weights) to perform actions on the input and produces an output. It’s like y = mx + b . the x is the input of the model, m and b are the variables (weights) and the y is the output. The machine algorithm has to optimize the m and b weights so when given any input, it will always produce the desired output. How do we optimize our variables? There are essentially two steps, generating a loss and backpropagation.
The Loss Function
The loss function is a function that represents how bad the model performed. The higher the loss, the worse the model performs. We can generate the loss by using labelled data . We can give the model input of x and let it produce its own output based on its weights. Since we have labelled data, we know what the output of the model should be. We can compare the model’s output to the real output and see the difference. For example, the model’s output might be 3 and the real output might be 5. A loss for this example could be 2, which is how far apart the two outputs were. We want the model to always produce a loss of 0, which means we need to minimize the loss function .
Backpropagation
Backpropagation is the process of going back to the model’s weights and changing them to minimize the loss function. It does this by taking the loss function and calculating it’s gradient. The gradient is the instantaneous rate of change of the generated loss, and looking upon the gradient, we can go to the lowest point of our gradient where the weights produce a lower loss. This is called gradient descent and is key in a lot of machine learning algorithms. We can repeat this process until our loss keeps getting smaller and smaller, and while our model keeps getting better and better. Usually, a model will have thousands of weights, so the model can become very accurate when producing the output.
Convolutional Neural Networks
Convolutional Neural Networks employ all the principals of machine learning mentioned above yet incorporate a method that can analyze spatial data, such as images and audio. It uses filters , a machine learning algorithm that uses weights to scan across the input data. Filters are a grid of weights that scan over the input data and at each scan, multiply the weight by the input. Since images are also just grids of values, convolutional filters can easily go over them. After a convolution or when it’s done scanning the image, the results of the scanning (the multiplied product) are pooled together. What the pooling is basically doing is representing the input data in a smaller concise way with all the important features. Essentially what convolutional neural networks do is that they deduce the special features of an image and put it all together.
Metastatic Cancer Detection Model
The Data
The data used to create the model was the Histopathologic Cancer Detection labelled data on Kaggle. It comes with over 150 000+ images of histopathologic image scans with every image labelled with either a 1 or a 0. 0 Indicates no metastatic cancer is prevalent in the image and 1 indicated that metastatic cancer is within the image. Each image is 96 pixels by 96 pixels by 3 colour channels (Red, Blue, Green). We can represent each image by a grid of values of 96x96x3.
The Model
The conception of the model was based on the principles of machine learning and tailored toward the task at hand.
Convolutional layers were used for the feature extraction of the images, to generalize what a cancerous image looks like and what a non-cancerous image looks like. The layers are 2D convolutional neural networks meaning that the convolutional layers take 3D matrices of values (the representation of the images).
The max pool layers take the feature maps created by the convolutional layers and generalize them. It does this by taking max value for every 3x3 square in the image. This way, when the feature maps are given to the other parts of the neural network, it can have an easier time figuring out if an image contains cancer or not.
The dropout layers that are scattered throughout the network are to make sure that the model learns to perform the task and doesn’t just memorize the data . It does this by turning off certain weights during the loss calculation, which leads to a slightly lower accuracy yet generalizes to a wide array of images.
Lastly, the dense layers make sense of the feature maps and actually figure out if there is cancer or not . The last layer contains one node or output, which will either be a 0 or 1. 0 meaning no cancer and 1 meaning cancer. The loss function will use this to calculate the loss and backpropagation will optimize the weights in these layers.
The model contains over 51 million+ parameters, meaning it can attain a really good accuracy.
Loss Function + Optimization
The loss function used for this model was the categorical cross-entropy loss . This loss takes the model output and performs the log function and multiplies it with the real loss. It uses the log function because as you start getting to the real answer, the loss gets exponentially better.
The optimizer used for this model was the Adam optimizer. This optimizer performs regular backpropagation but as it gets closer to the global minimum of the loss function, it slows down how fast it optimizer to prevent overstepping the minimum. It’s a pretty standard optimization function.
Training
The model trained for over 10 epochs or performed backpropagation in batches of 100 images each 10 times over the whole dataset. 20% of the images were used for validation or making sure that the model was training. After a couple of hours of training, the model was finished.
The Results are in…
The model got an 80% accuracy rate of detecting cancer. Which means, 80% will be right in its prediction. Here are a few examples:
Overall, the model works very well and is a viable option when detecting cancer.
References:
https://mc.ai/gradient-descent-and-its-types/ (Gradient Descent Image)
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
机器学习及其应用2007
周志华 编 / 清华大学 / 2007-10 / 37.00元
机器学习是人工智能的一个核心研究领域,也是近年来计算机科学中最活跃的研究分支之一。目前,机器学习技术不仅在计算机科学的众多领域中大显身手,还成为一些交叉学科的重要支撑技术。本书邀请相关领域的专家撰文,以综述的形式介绍机器学习中一些领域的研究进展。全书共分13章,内容涉及高维数据降维、特征选择、支持向量机、聚类、强化学习、半监督学习、复杂网络、异构数据、商空间、距离度量以及机器学习在自然语言处理中的......一起来看看 《机器学习及其应用2007》 这本书的介绍吧!