内容简介:Use fastai and image_tabular to integrate image and tabular data for deep learning and train a joint model using the integrated dataI recently participated in theThe SIIM-ISIC Melanoma Classification dataset can be downloaded
Use fastai and image_tabular to integrate image and tabular data for deep learning and train a joint model using the integrated data
I recently participated in the SIIM-ISIC Melanoma Classification competition on Kaggle. In this competition, participants are asked to identify melanoma in images of skin lesions. Interestingly, they also provide metadata about the patient and the anatomic site in addition to the image. In essence, we have both image and structured or tabular data for each example. For the image, we can use a CNN -based model, and for the tabular data, we can use embeddings and fully connected layers as explored in my previous posts on UFC and League of Legends predictions. It is easy to build two separate models for each data modality. But what if we want to build a joint model that trains on both data modalities simultaneously? There are inspiring discussions in the competition forum including this thread . In this post, I will demonstrate how to integrate the two data modalities and train a joint deep learning model using fastai and the image_tabular library, which I created specifically for these tasks.
The SIIM-ISIC Dataset
The SIIM-ISIC Melanoma Classification dataset can be downloaded here . The training set consists of 32542 benign images and 584 malignant melanoma images. Please note that this dataset is extremely unbalanced. The picture below shows one example from each class. It seems that malignant lesions are larger and more diffused than benign ones.
As mentioned above, there are metadata available in addition to the images as shown below:
We can perform some basic analysis to investigate whether some of these features are associated with the target. Interestingly, males are more likely to have malignant melanoma than females, and age also seems to be a risk factor of having malignant melanoma as shown below. In addition, the frequency of malignancy melanoma differs between the locations of the imaged site with the head/neck showing the highest malignancy rate. Therefore, these features contain useful information, and combining them with the images could help our model make better predictions. This makes sense as doctors will probably not only examine images of skin lesions but also consider additional factors in order to make a diagnosis.
The Approach
Our approach to integrating both image and tabular data is very similar to the one taken by the winners of the ISIC 2019 Skin Lesion Classification Challenge as described in their paper and shown in the picture below. Basically, we first load the image and tabular data for each sample, which are fed into a CNN model and a fully connected neural network, respectively. Subsequently, the outputs from the two networks will be concatenated and fed into an additional fully connected neural network to generate final predictions.
Implementation with the image_tabular library
To implement the idea, we will be using Pytorch and fastai . More specifically, we will use fastai to load the image and tabular data and package them into fastai LabelLists.
Next, we will integrate the two data modalities using the image_tabular library, which can be installed by running:
pip install image_tabular
We will use the get_imagetabdatasets
function from image_tabular to integrate image and tabular LabelLists.
The databunch contains both image and tabular data and is ready to be used for training and prediction.
Once the data is ready, we can then move on to build the model. First, we need to create a CNN model, resnet50 in this case, and a tabular model using fastai. We will treat sex and anatomic site as categorical features and represent them using embeddings in the tabular model.
We are now ready to build a joint model, again using the image_tabular library. We can customize the fully connected layers by specifying the layers parameter.
Finally, we can pack everything into a fastai learner and train the joint model.
The entire workflow is detailed in this Jupyter notebook .
Results
The model achieved a ROC AUC score of about 0.87 on the validation set after training for 15 epochs. I subsequently submitted the predictions made by the trained model on the test set to Kaggle and got a public score of 0.864. There is definitely much room for improvement.
Summary
In this post, we used fastai and image_tabular to integrate image and tabular data and built a joint model trained on both data modalities simultaneously. As noted above, there are many opportunities for further improvement. For example, we can try more advanced CNN architectures such as ResNeXt. Another question would be how many neurons should we allocate for image and tabular data before concatenation, in other words, how should we decide the relative importance or weights of the two data modalities? I hope this could serve as a framework for further experimentation and improvement.
Source Code
The source code of image_tabular and jupyter notebooks for the SIIM-ISIC Melanoma Classification competition can be found here .
Acknowledgments
The image_tabular library relies on the fantastic fastai library and was inspired by the code of John F. Wu.
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
恰如其分的软件架构
George Fairbanks / 张逸、倪健、高翌翔 / 华中科技大学出版社 / 2013-9-1 / 88.00
本书描述了一种恰如其分的软件架构设计方法。作者建议根据项目面临的风险来调整架构设计的成本,并从多个视角阐述了软件架构的建模过程和方法,包括用例模型、概念模型、域模型、设计模型和代码模型等。本书不仅介绍方法,而且还对方法和概念进行了归类和阐述,将软件架构设计融入开发实践中,与 敏捷开发方法有机地结合在一起,适合普通程序员阅读。 . 这是一本超值的书,案例丰富有趣,言简意赅,阅读轻松。当年......一起来看看 《恰如其分的软件架构》 这本书的介绍吧!
HTML 压缩/解压工具
在线压缩/解压 HTML 代码
图片转BASE64编码
在线图片转Base64编码工具