内容简介:This post is part of a series. If you don’t know what a microscopy stain is, how much and why it varies so strongly, or how to estimate it on small image patches, you can find all of that in the first part:This time, we want to get a little bit more hands-
Getting a stable stain estimate on gigapixel images.
Jun 18 ·6min read
This post is part of a series. If you don’t know what a microscopy stain is, how much and why it varies so strongly, or how to estimate it on small image patches, you can find all of that in the first part: “Microscopy stain variations and how to estimate them” .
This time, we want to get a little bit more hands-on and solve a problem that you might have if you deal with whole slide microscopy images: Those images are really big. And by that, I mean that they can easily exceed 100,000 pixels in width and height. Let us now think of an application, where we want to normalize the complete image consistently, for instance as a preprocessing to a deep-learning-based recognition pipeline.
For this, it is of great use to have a robust estimate of the overall stain matrix. Then we can, later on, apply the same kind of color deconvolution for every image patch and get the same results. This is also really cool if we go for the extraction of small patches later on since these can sometimes fail to meet the precondition to have a sufficient amount of both dyes (hematoxylin and eosin in our example) present, which results in bad statistical properties.
I promised you it would be more hands-on, this time. So I prepared a complete code example as a Jupyter notebook, that you can find here . Let’s go through it together now.
The code consists of four parts:
- Downloading the whole slide image
- Finding a segmentation mask for valid regions to sample from
- Random sampling of image patches from those regions
- Stain estimation on the concatenated image patches
For the example, I picked a whole slide image which is easily available as open data from our public mast cell tumor data set [1]. It can be downloaded with a single HTTP get using urllib:
urllib.request.urlretrieve('https://ndownloader.figshare.com/files/16261493?private_link=a82ddb634864c24f4aee',
'f3741e764d39ccc4d114.svs')
Let’s go ahead now:
To load the slide, we use the openslide library.
slide = openslide.open_slide('f3741e764d39ccc4d114.svs') overview = slide.read_region(location=[0,0], size=slide.level_dimensions[-1], level=3) overview.show()
As you can see, the image is showing a tumor cross-section in the center. It contains also a lot of almost white background, where no tissue was present. The “white” color values will differ from slide scanner to slide scanner, so it is somewhat sensible to use adaptive thresholding to generate a segmentation mask of tissue presence.
For this, we will be using OpenCV, since it conveniently provides all operations we need:
# Convert to grayscale
gray = cv2.cvtColor(np.array(overview)[:,:,0:3],cv2.COLOR_BGR2GRAY)
# OTSU thresholding
ret, thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
# dilate
dil = cv2.dilate(thresh, kernel = np.ones((7,7),np.uint8))
# erode
activeMap = cv2.erode(dil, kernel = np.ones((7,7),np.uint8))
# and let's visualize it as well:
from PIL import Image
Image.fromarray(activeMap)
Now we already can tick off task 1 and 2 of the list. Now it’s time to randomly sample from all regions that are at least to 95% covered by tissue. Experience shows that 30 images will give us a good estimate.
Let’s briefly discuss the trade-off between using very big patches and using lots of them: Both will make the estimation process slow if we overdo it. In order to have good statistical stability for our estimate, we should, however, sample quite some images. If we use big image patches instead, we will lose some of the generalization that we want to achieve — the statistics should cover the image in its entirety, after all.
k=0
patches = []
ds=32 # downsample - we are using level 3 of the WSI which has a downsample of 32
while (k<30):
x_ds = np.random.randint(0, slide.level_dimensions[0][0]/ds)
y_ds = np.random.randint(0, slide.level_dimensions[0][1]/ds)
step_ds = int(1024/ds)
canbeused = np.sum((activeMap[y_ds:y_ds+step_ds,x_ds:x_ds+step_ds])>1)>0.95*step_ds*step_ds
if (canbeused):
k=k+1
x=int(x_ds*ds)
y=int(y_ds*ds)
img = slide.read_region(location=(x,y), size=(256,256), level=1)
patch = np.uint8(np.array(img)[:,:,0:3])
patches.append(patch[None,:,:,:])
You see that in the variable canbeused we calculated for each randomly selected patch of the downsampled map if it is representing an image region covered with tissue by at least 95%. If this is true, we sample the image patch from that region.
Now that we have a good choice of representative patches, we can stack them all to have a big chunk of image pixels, all representing the stain of the complete whole slide image.
We can thus now transform the data into optical density (OD) representation, and cut off all very bright pixels (OD below or equal 0.15):
def RGB2OD(image:np.ndarray) -> np.ndarray:
mask = (image == 0)
image[mask] = 1
return np.maximum(-1 * np.log(image / 255), 1e-5)
OD = RGB2OD(np.stack(patches).reshape(-1,3))
OD = (OD[(OD > 0.15).any(axis=1), :])
Next, we calculate the eigen vectors and values of the matrix and project the OD pixel values onto the plane spanned by the first two eigenvectors:
_, eigenVectors = np.linalg.eigh(np.cov(OD, rowvar=False))
eigenVectors = eigenVectors[:, [2, 1]] # strip off residual stain component
if eigenVectors[0, 0] < 0: eigenVectors[:, 0] *= -1
if eigenVectors[0, 1] < 0: eigenVectors[:, 1] *= -1T_hat = np.dot(OD, eigenVector)
We can’t know about the order of the vectors, but we can assume that they need to be positive, so we multiply by -1 in the opposite case.
The two stain vectors are represented by the maximum and minimum angles of this point cloud. We thus calculate the angle, and then perform a robust estimate using percentiles:
phi = np.arctan2(T_hat[:, 1], T_hat[:, 0]) min_Phi = np.percentile(phi, 1) max_Phi = np.percentile(phi, 99)
Finally, we have to transform the angles back to the 3D coordinates in OD space. To get more predictable results, we want to have the larger component first in the vector (which is the hematoxylin stain).
v1 = np.dot(eigenVector, np.array([np.cos(min_Phi), np.sin(min_Phi)]))
v2 = np.dot(eigenVector, np.array([np.cos(max_Phi), np.sin(max_Phi)]))
if v1[0] > v2[0]:
stainVectors = np.array([v1, v2])
else:
stainVectors = np.array([v2, v1])
Et voila. There we have our robust estimate of stain. Let’s visualize this finally:
import random
# take random sample for plotting
randsamp = np.array(random.sample(OD.tolist(),1000))
# plot points
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import
fig = plt.figure(figsize=(10,10))
ax = fig.gca(projection='3d')
# color needs to be given as RGBA values in range 0..1 for matplotlib
color=np.ones((1000,4),np.float32)
color[:,0:3] = OD2RGB(randsamp)/255.
ax.scatter(randsamp[:,0],randsamp[:,1],randsamp[:,2], c=color)
# and plot stain vectors (with correct color)
ax.plot([0, stainVectors[0,0]],[0, stainVectors[0,1]],[0, stainVectors[0,2]], linewidth=4, color=(OD2RGB(stainVectors[0,:])/255.).tolist()+[1.])
ax.plot([0, stainVectors[1,0]],[0, stainVectors[1,1]],[0, stainVectors[1,2]], linewidth=4, color=(OD2RGB(stainVectors[1,:])/255.).tolist()+[1.])
ax.set_xlabel('red (OD)')
ax.set_ylabel('green (OD)')
ax.set_zlabel('blue (OD)')
plt.title('stain vector estimation')
That’s it. Just a straight-forward extension of the algorithm from Macenko et al. [2] that we discussed in the last post to cater for the big size of the images.
Now that we know how to estimate stains on whole slide images, we could do an analysis of how much variation is there in the public data sets. But that is part of the next post, so stay tuned.
[1] Bertram, C. A., et al. (2019). A large-scale dataset for mitotic figure assessment on whole slide images of canine cutaneous mast cell tumor. Scientific Data, 6 (274), 1–9.
[2]: Macenko, Marc, et al. A method for normalizing histology slides for quantitative analysis ” (2009) IEEE International Symposium on Biomedical Imaging: From Nano to Macro.
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。