Stain estimation on microscopy whole slide images

栏目: IT技术 · 发布时间: 4年前

内容简介：This post is part of a series. If you don’t know what a microscopy stain is, how much and why it varies so strongly, or how to estimate it on small image patches, you can find all of that in the first part:This time, we want to get a little bit more hands-

Getting a stable stain estimate on gigapixel images.

Stain estimation on microscopy whole slide images

Dr. Marc Aubreville

Jun 18 ·6min read

This post is part of a series. If you don’t know what a microscopy stain is, how much and why it varies so strongly, or how to estimate it on small image patches, you can find all of that in the first part: “Microscopy stain variations and how to estimate them” .

This time, we want to get a little bit more hands-on and solve a problem that you might have if you deal with whole slide microscopy images: Those images are really big. And by that, I mean that they can easily exceed 100,000 pixels in width and height. Let us now think of an application, where we want to normalize the complete image consistently, for instance as a preprocessing to a deep-learning-based recognition pipeline.

For this, it is of great use to have a robust estimate of the overall stain matrix. Then we can, later on, apply the same kind of color deconvolution for every image patch and get the same results. This is also really cool if we go for the extraction of small patches later on since these can sometimes fail to meet the precondition to have a sufficient amount of both dyes (hematoxylin and eosin in our example) present, which results in bad statistical properties.

I promised you it would be more hands-on, this time. So I prepared a complete code example as a Jupyter notebook, that you can find here . Let’s go through it together now.

The code consists of four parts:

Downloading the whole slide image
Finding a segmentation mask for valid regions to sample from
Random sampling of image patches from those regions
Stain estimation on the concatenated image patches

For the example, I picked a whole slide image which is easily available as open data from our public mast cell tumor data set [1]. It can be downloaded with a single HTTP get using urllib:

urllib.request.urlretrieve('https://ndownloader.figshare.com/files/16261493?private_link=a82ddb634864c24f4aee', 
                               'f3741e764d39ccc4d114.svs')

Let’s go ahead now:

To load the slide, we use the openslide library.

slide = openslide.open_slide('f3741e764d39ccc4d114.svs')
overview = slide.read_region(location=[0,0], size=slide.level_dimensions[-1], level=3)
overview.show()

As you can see, the image is showing a tumor cross-section in the center. It contains also a lot of almost white background, where no tissue was present. The “white” color values will differ from slide scanner to slide scanner, so it is somewhat sensible to use adaptive thresholding to generate a segmentation mask of tissue presence.

For this, we will be using OpenCV, since it conveniently provides all operations we need:

# Convert to grayscale
gray = cv2.cvtColor(np.array(overview)[:,:,0:3],cv2.COLOR_BGR2GRAY)

# OTSU thresholding
ret, thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)

# dilate
dil = cv2.dilate(thresh, kernel = np.ones((7,7),np.uint8))

# erode
activeMap = cv2.erode(dil, kernel = np.ones((7,7),np.uint8))

# and let's visualize it as well:
from PIL import Image
Image.fromarray(activeMap)

Now we already can tick off task 1 and 2 of the list. Now it’s time to randomly sample from all regions that are at least to 95% covered by tissue. Experience shows that 30 images will give us a good estimate.

Let’s briefly discuss the trade-off between using very big patches and using lots of them: Both will make the estimation process slow if we overdo it. In order to have good statistical stability for our estimate, we should, however, sample quite some images. If we use big image patches instead, we will lose some of the generalization that we want to achieve — the statistics should cover the image in its entirety, after all.

k=0
patches = []
ds=32 # downsample - we are using level 3 of the WSI which has a downsample of 32
while (k<30):
    x_ds = np.random.randint(0, slide.level_dimensions[0][0]/ds)
    y_ds = np.random.randint(0, slide.level_dimensions[0][1]/ds)
    step_ds = int(1024/ds)

    canbeused = np.sum((activeMap[y_ds:y_ds+step_ds,x_ds:x_ds+step_ds])>1)>0.95*step_ds*step_ds
    if (canbeused):
        k=k+1
        x=int(x_ds*ds)
        y=int(y_ds*ds)
        img = slide.read_region(location=(x,y), size=(256,256), level=1)
        patch = np.uint8(np.array(img)[:,:,0:3])

        patches.append(patch[None,:,:,:])

You see that in the variable canbeused we calculated for each randomly selected patch of the downsampled map if it is representing an image region covered with tissue by at least 95%. If this is true, we sample the image patch from that region.

Now that we have a good choice of representative patches, we can stack them all to have a big chunk of image pixels, all representing the stain of the complete whole slide image.

We can thus now transform the data into optical density (OD) representation, and cut off all very bright pixels (OD below or equal 0.15):

def RGB2OD(image:np.ndarray) -> np.ndarray:
    mask = (image == 0)
    image[mask] = 1
    return np.maximum(-1 * np.log(image / 255), 1e-5)

OD = RGB2OD(np.stack(patches).reshape(-1,3))
OD = (OD[(OD > 0.15).any(axis=1), :])

Next, we calculate the eigen vectors and values of the matrix and project the OD pixel values onto the plane spanned by the first two eigenvectors:

_, eigenVectors = np.linalg.eigh(np.cov(OD, rowvar=False))
eigenVectors = eigenVectors[:, [2, 1]] # strip off residual stain component

if eigenVectors[0, 0] < 0: eigenVectors[:, 0] *= -1
if eigenVectors[0, 1] < 0: eigenVectors[:, 1] *= -1T_hat = np.dot(OD, eigenVector)

We can’t know about the order of the vectors, but we can assume that they need to be positive, so we multiply by -1 in the opposite case.

The two stain vectors are represented by the maximum and minimum angles of this point cloud. We thus calculate the angle, and then perform a robust estimate using percentiles:

phi = np.arctan2(T_hat[:, 1], T_hat[:, 0])
min_Phi = np.percentile(phi, 1)
max_Phi = np.percentile(phi, 99)

Finally, we have to transform the angles back to the 3D coordinates in OD space. To get more predictable results, we want to have the larger component first in the vector (which is the hematoxylin stain).

v1 = np.dot(eigenVector, np.array([np.cos(min_Phi), np.sin(min_Phi)]))
v2 = np.dot(eigenVector, np.array([np.cos(max_Phi), np.sin(max_Phi)]))
if v1[0] > v2[0]:
    stainVectors = np.array([v1, v2])
else:
    stainVectors = np.array([v2, v1])

Et voila. There we have our robust estimate of stain. Let’s visualize this finally:

import random
# take random sample for plotting
randsamp = np.array(random.sample(OD.tolist(),1000))

# plot points
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D  # noqa: F401 unused import
fig = plt.figure(figsize=(10,10))
ax = fig.gca(projection='3d')

# color needs to be given as RGBA values in range 0..1 for matplotlib
color=np.ones((1000,4),np.float32)
color[:,0:3] = OD2RGB(randsamp)/255.

ax.scatter(randsamp[:,0],randsamp[:,1],randsamp[:,2], c=color)

# and plot stain vectors (with correct color)
ax.plot([0, stainVectors[0,0]],[0, stainVectors[0,1]],[0, stainVectors[0,2]], linewidth=4, color=(OD2RGB(stainVectors[0,:])/255.).tolist()+[1.])
ax.plot([0, stainVectors[1,0]],[0, stainVectors[1,1]],[0, stainVectors[1,2]], linewidth=4, color=(OD2RGB(stainVectors[1,:])/255.).tolist()+[1.])

ax.set_xlabel('red (OD)')
ax.set_ylabel('green (OD)')
ax.set_zlabel('blue (OD)')
plt.title('stain vector estimation')

That’s it. Just a straight-forward extension of the algorithm from Macenko et al. [2] that we discussed in the last post to cater for the big size of the images.

Now that we know how to estimate stains on whole slide images, we could do an analysis of how much variation is there in the public data sets. But that is part of the next post, so stay tuned.

[1] Bertram, C. A., et al. (2019). A large-scale dataset for mitotic figure assessment on whole slide images of canine cutaneous mast cell tumor. Scientific Data, 6 (274), 1–9.

[2]: Macenko, Marc, et al. A method for normalizing histology slides for quantitative analysis ” (2009) IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Stain estimation on microscopy whole slide images

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

灵活Web设计

Zoe Mickley Gillenwater / 李静 / 2009-9 / 45.00元

《灵活Web设计》讲述如何应用可变或不固定布局及弹性布局来实现灵活设计，以满足用户的根据自己需求而调整浏览站点的窗口大小的要求。全书共分为9章，内容包括：理解灵活布局、可变布局和弹性布局存在的挑战、设计灵活布局的方法、准备网页设计、创建可变布局结构、创建弹性布局结构、规范灵活性、设置文字间距、添加背景图像和颜色、创建灵活的图像。《灵活Web设计》适用于网页设计人员、网页设计爱好者。一起来看看《灵活Web设计》这本书的介绍吧!

码农工具

Stain estimation on microscopy whole slide images

Getting a stable stain estimate on gigapixel images.

灵活Web设计

MD5 加密

XML、JSON 在线转换

HSV CMYK 转换工具