内容简介:Did you know the old childhood photo you have in that dusty album can be restored? Yeah, that one in which everyone is holding hands and having the time of their lives! Don’t believe me? Check this out here —Inpaintingis aImage inpainting is an active area
Python-PyTorch
Inpainting with AI — get back your images! [PyTorch]
Solving the problem of Image Inpainting with PyTorch and Python
Apr 22 ·6min read
Did you know the old childhood photo you have in that dusty album can be restored? Yeah, that one in which everyone is holding hands and having the time of their lives! Don’t believe me? Check this out here —
Inpaintingis a conservation process where damaged, deteriorating, or missing parts of an artwork are filled in to present a complete image. [1] This process can be applied to both physical and digital art mediums such as oil or acrylic paintings, chemical photographic prints , 3-dimensional sculptures , or digital images and video . — https://en.wikipedia.org/wiki/Inpainting
Image inpainting is an active area of AI research where AI has been able to come up with better inpainting results than most artists. In this article, we are going to discuss image inpainting using neural networks — specifically context encoders. This article explains and implements the research work on context encoders that was presented in CVPR 2016.
Context Encoders
To get started with Context Encoders, we have to learn what are autoencoders . An autoencoder structurally consists of an encoder, a decoder and a bottleneck. The general autoencoder aims to reduce image size by ignoring the noise in the image. Autoencoders are however not specific for images and can be extended to other data as well. There are specific variants of autoencoders to fulfill specific tasks.
Now that we know about autoencoders we can describe context encoders as an analogy to autoencoders. A context encoder is a convolutional neural network trained to generate the contents of an arbitrary image region on the basis of its surroundings — i.e. a context encoder takes in the surrounding data of the image region and tries to generate something that would fit into the image region. Same as we fitted jigsaw puzzles when we were small — only we didn’t have to generate the puzzle pieces ;)
Our context encoder here consists of an encoder capturing the context of an image into a compact latent feature representation and a decoder which uses that representation to produce the missing image content. Missing image content? — Since we need an enormous dataset to train a neural network, we cannot afford to work with just the inpainting problem images. So we block out portions of images from normal image datasets to create an inpainting problem and feed the images to the neural network, thus creating missing image content at the region we block.
[It is important to note here that the images fed to the neural network have too many missing portions for classical inpainting methods to work at all.]
Use of GAN
GANs or Generative Adversarial Networks have been shown to be extremely useful for image generation. Generative Adversarial Networks run on a basic principle of a generator trying to ‘fool’ a discriminator and a determined discriminator trying to get hold of the generator. In other words, two networks trying to minimize and maximize a loss function respectively.
More about GANs here — https://medium.com/@hmrishavbandyopadhyay/generative-adversarial-networks-hard-not-eea78c1d3c95
Region Masks
Region Masks are the portion of images we block out so that we can feed the generated inpainting problems to the model. By blocking out, we just set the pixel value to zero for that image region. Now, there are 3 ways we can do this —
- Central Region: The simplest way of blocking out image data is to set a central square patch as zero. Although the network learns inpainting, we face the problem of generalization. The network fails to generalize well and only low level features are learned.
- Random Block: To counter the problem of the network ‘latching’ onto the masked region boundary as in central region mask, the masking process is randomized. Instead of choosing a single square patch as mask, a number of overlapping square masks are set up which take up to 1/4 of the image.
- Random Region: The Random Block masking, however, still has sharp boundaries for the network to latch onto. To deal with this, arbitrary shapes have to be removed from images. Arbitrary shapes can be obtained from the PASCAL VOC 2012 dataset, deformed and placed as masks at random image locations.
In here, I have implemented only the Central Region masking method as this is just a guide to get you started on inpainting with AI. Feel free to try with other masking methods and let me know about the results in the comments!
Structure
By now, you should have some idea about the model. Let’s see if you’re correct ;)
The model consists of an encoder and a decoder section, building up the context-encoder part of the model. This part also acts as the generator which generates data and tries to fool the discriminator. The discriminator consists of convolution networks followed by a Sigmoid function that finally gives a single scalar as output.
Loss
The loss function of the model is divided into 2 parts:
- Reconstruction Loss — The reconstruction loss is a L2 loss function. It helps to capture the overall structure of the missing region and coherence with regards to its context. Mathematically, it is expressed as —
It is important to note here that only using L2 loss would give us a blurry image. Because having a blurry image reduces the mean pixel wise error and thus the L2 loss is minimized — but not in a way we want it to.
2. Adversarial Loss — This tries to make the prediction ‘look’ real (remember the generator has to fool the discriminator!) and this helps us in getting over the blurry image that the L2 loss would have led us into. Mathematically, we can express it as —
Here an interesting observation is that the adversarial loss encourages the entire output to look real and not just the missing part. The adversarial network, in other words, gives the whole image a realistic look.
The total loss function:
Let’s build it!
Now since we have cleared the main points of the network lets get down to building the model. I will first build the model structure and then will get down to the training and the loss function part. The model will be built with the help of the PyTorch library on python.
Let’s start with the generator network:
Now, the discriminator network:
Let’s start training the network now. We will set the batch-size to 64, and the number of epochs to 100. The learning rate is set to 0.0002.
Results
Let’s take a glance at what our model has been able to build!
Images at the zeroth epoch(noise) —
Images at the 100th epoch —
Let’s see what went into the model —
That from this ? Yeah! Pretty cool, huh?
Implement your version of the model. Watch it recreate your childhood photos — and if you are good enough, you might just recreate the future of Inpainting with AI. So, what are you waiting for?
Let me know in the comments if anything goes wrong with your implementation. Here to help :)
以上所述就是小编给大家介绍的《Inpainting with AI — get back your images! [PyTorch]》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。