内容简介:Machine Learning has shown some power to recognize patterns such as data distribution, images, and sequence of events to solve classification and regression problems. Ian Goodfellow et al. in 2014 [1] published an article using two separate neural networks
A walkthrough on how GAN models work with examples in Python.
Jul 18 ·12min read
T he hypothetical example of Machine Learning is imagined around having a machine that is able to think and mimic passing a test with some degree of intelligent. Although this the ultimate goal, we are not there yet, and we still have a long way to go. In the past few years, many models have been developed to learn in an unsupervised mode attempting to engage in a competitive settings against another computer or human to perform a certain task. This article shed some light on the use of Generative Adversarial Networks (GANs) and how they can be used in today’s world.
I. GANs and Machine Learning
Machine Learning has shown some power to recognize patterns such as data distribution, images, and sequence of events to solve classification and regression problems. Ian Goodfellow et al. in 2014 [1] published an article using two separate neural networks to generate synthetic data that has similar properties as the real ones. This work has made the research community more interested in generating realistic images, videos, and generic synthetic structural data.
GANs are unsupervised deep learning techniques. Usually, it is implemented using two neural networks: Generator and Discriminator. These two models compete with each other in a form of a game setting. The GAN model would be trained on real data and data generated by the generator. The discriminator’s job is to determine fake from real data. The generator is a learning model, so initially, it is likely to produce low or even completely noisy data that does not reflect the real distribution or the properties of the real data.
The generator model’s primary goal is generating artificial data that can pass the discriminator successfully. The model starts taking some noise, usually Gaussian noise and produces an image formatted as a vector of pixels. The generator must learn how to trick the discriminator and win a positive classification (produced image classified as real). The generation step’s loss is computed whenever any of those generated images detected successfully as “fake”. The discriminator has to learn how to identify those fake images progressively. Negative loss is given to the discriminator whenever the model fails to recognize a fake image. The key concept is the simultaneous training of the generator and the discriminator at the same time.
Example of Generating Handwritten Digits:
The research community has many interesting datasets to measure the accuracy of a GAN model. In this article, we would use a few of those datasets in details, starting with MNIST. MNIST is one of the most significant examples of explaining the generative models’ theory used widely for image processing. A sample from the MNIST dataset is shown in Figure 2.
To generate artificial handwritten images, we need to implement two models: one to generate fake images and another to classify fake from real ones. The overall pipleline of training a GAN model is shown in Figure 3.
There are many architectures to consider for building the discriminator and the generator. We could build a deep neural network or Convolutional Neural Network (CNN) and some other options. We will go over the types of GAN models shortly, but first, let’s pick CNN for now.
The source code of this example is available on my Github .
The discriminator model architecture start by recieving an image (28 x 28 x 1) and pass it through two convolutional layers with 64 filters in each. Alternatively, we could use 128 filters, which represents the number of hidden nodes in each layer. We can make the neural network architecture denser by using three layers with 64, 128, and 256 hidden nodes. To simplify how GAN networks work, we will use simple architecture in this tutorial, which still gives high accuracy. Figure 4 shows the overall architecture of the discriminator.
The generator model learns how to generate realistic images, but it needs to start from some random points in the latent space. If you compare the generator architecture in Figure 5 with the discriminator architecture in Figure 4, you would realize they look almost identical. It is essential to know that it is not necessary to flip the discrinmiator when building the generator network. The generator’s architecture can have a different number of layers, filters, and higher overall complexity.
Another main difference between the discriminator and the generator is the use of an activation function. The discrminator uses a sigmoid in the output layer. It is a boolean classification problem, and this will ensure the output would be either 0 or 1. The generator, on the other hand, has no loss function or any optimization algorithm to be used. It uses transpose convolution layers to upsample the low-resolution dense layer from the latent space to build a higher resolution images. The trick when building the generator model is that we don’t need to compile it. The GAN model now would combine the full framework, which combines the generator, the discriminator, and compile the model. We will discuss those aspects in detail in the following section.
def building_gan(generator, discriminator): GAN = Sequential() discriminator.trainable = False # Adding the generator and the discriminator GAN.add(generator) GAN.add(discriminator) # Optimization function opt = tf.keras.optimizers.Adam(lr=2e-4, beta_1=0.5) # Compile the model GAN.compile(loss='binary_crossentropy', optimizer=opt) return GAN
The next animation shows how the generator is improving in each set of epochs during the training:
GAN Capabilities and Challenges
(false trainable)
(1) Evaluation
One of the critical issues is approximating the quality of the generated data, whether it is an image, a text, or a song, and the diversity of those produced articles. The discriminator helps us to check whether the generated data is real or fake. However, the generated samples might look realistic from the discriminator point of view, but might be too obvious for the human eyes to notice. Hence, we need evaluation metrics that correlate with subjective evaluation. One way to look into this problem is by analyzing the distribution properties between the real and generated data.
Two evaluation metrics can statistically help measure the quality of the generated data: Inception Score [3] and Frechet Inception [4]. Both fo these objective metrics are widely adopted by the research community, especially for measuring the quality of produced images. Since this tutorial is an introduction, we will not detail how these metrics work.
(2) Loss Function
As we discussed earlier, the GAN model has a unique property of simultaneously training the generator and the discriminator at the same time. This requires loss functions that balance the training on one side (discriminator) while also improving the training on the other side of (generator). When building the discriminator model, we explicitly define the loss function just like any other neural network architecture.
# Defining the discriminator model
def building_discriminator():
# The image dimensions provided as inputs
image_shape = (28, 28, 1)
disModel = Sequential()
disModel.add(Conv2D(64, 3, strides=2, input_shape=image_shape))
disModel.add(LeakyReLU())
disModel.add(Dropout(0.4))
# Second layer
disModel.add(Conv2D(64, 3, strides=2))
disModel.add(LeakyReLU())
disModel.add(Dropout(0.4))
# Flatten the output
disModel.add(Flatten())
disModel.add(Dense(1, activation='sigmoid'))
# Optimization function
opt = tf.keras.optimizers.Adam(lr=2e-4, beta_1=0.5)
# Compile the model
disModel.compile(loss='binary_crossentropy', optimizer=opt, metrics = ['accuracy'])
return disModel
The generator model, on the other hand, does not have the loss function explicitly defined. It is based on the training of the discriminator and the generator updated according to its loss function.
# Defining the generator model
def building_generator(noise_dim):
genModel = Sequential()
genModel.add(Dense(128 * 6 * 6, input_dim=noise_dim))
genModel.add(LeakyReLU())
genModel.add(Reshape((6,6,128)))
# Second layer
genModel.add(Conv2DTranspose(128, (4,4), strides=(2,2)))
genModel.add(LeakyReLU())
# Third layer
genModel.add(Conv2DTranspose(128, (4,4), strides=(2,2)))
genModel.add(LeakyReLU())
genModel.add(Conv2D(1, (3,3), activation='sigmoid'))
return genModel
There are few options for choosing the loss functions, such as:
- Least squares.
- Wasserstein loss function.
(3) Determination of Convergence
One of the key issues associated with GAN models is the determination of when the model is converged. The competition between the discriminator and generator makes the game hard to reach a final winner. Both models are optimally want to maximize their gain and minimizes their loss. In our situation, we want both models to reach the point where they almost make a complete guess whether an image is fake or real and whether the generated image will pass the discriminator successfully. The 50–50 chance is the perfect ideal case inherited from the game theory, where both models are good hard to win over.
GAN models are known to have the problem of slow convergence. Similar to other unsupervised models, the absence of true labels increase the challenge for determining when the training can stop. We need to make sure to balance between the training time and the produced quality. Several factors contribute to slow or speed up the training process, such as normalization of inputs, batch normalization, gradient penalties, and training the discriminator well before training the GAN model.
(4) Produced Image Sizes
GAN models are known to have a limited capabilities when it comes to the size of the generated images. The images size that we have seen in the MNIST examples are only 28 x 28 pixels. These are pretty small images to use in a real application. If we want to generate a bigger images, let us say 1024 x 1024, we will need a more scalable model. The research community has been interested in improving GAN capabilities. For instance, in 2017, T Karras et al. propose a novel model called Progressive Growing GANs to solve such problem [2].
Types of GAN Models
Some of the challenges introduced in the previous sections made the research community expand the GAN models’ idea to tackle one or more of the issues mentioned above. This section covers some popular extensions and optimized GAN architectures to scale up the original GAN capabilities.
Deep Convolutional GAN (DCGAN):This an extension to replace the feed forward neural network with a CNN architecture proposed by A. Radford et al. [5]. The idea of using a CNN architecture and learning through filters have improved the accuracy of GAN models.
Wasserstein GAN (WGAN):WGAN is designed by M. Arjovsky et al. [6]. WGAN focuses on defining the distance between the generated distribution and the real distribution, which determines the model’s convergence. They propose the use of Earth Mover (EM) distance to approximate the differences between those distributions effectively.
Progressive GAN:ProgressiveGAN is designed by T. Karras et al. [7] and presented at ICLR conference. This work bough high contributions to the generator and discriminator to grow progressively from lower-resolution to higher-resolution layers. The technique requires reducing the size of the mini-batches while computing the mini-batch standard deviation. ProgressiveGan also uses equalized learning rate, and pixel-wise feature normalization.
以上所述就是小编给大家介绍的《Generative Adversarial Networks GANs: A Beginner’s Guide》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
响应式Web设计实践
[美] Tim Kadlec / 侯鸿儒 / 人民邮电出版社 / 2013-3-1 / 55.00元
随着各种各样的移动设备不断地涌现到使用者面前,Web设计的适应性已经成为设计师们所面临的最为艰巨的挑战。你设计出的网站不仅要在桌面计算机的大尺寸屏幕上可以为用户提供友好的UI和用户体验,同时在小尺寸屏幕上也应该可以提供一致的用户体验,并可以让用户能够在桌面大屏幕上和移动小屏幕上平滑切换,同时没有任何的不适应感觉。 本书作者是一位出色的开发者,在本书中,他将诸多技术和设计理念杂糅在一起,再辅以......一起来看看 《响应式Web设计实践》 这本书的介绍吧!