Generative Adversarial Networks GANs: A Beginner’s Guide

栏目: IT技术 · 发布时间: 4年前

内容简介:Machine Learning has shown some power to recognize patterns such as data distribution, images, and sequence of events to solve classification and regression problems. Ian Goodfellow et al. in 2014 [1] published an article using two separate neural networks

A walkthrough on how GAN models work with examples in Python.

Generative Adversarial Networks GANs: A Beginner’s Guide

Photo by drmakete lab on Unsplash

T he hypothetical example of Machine Learning is imagined around having a machine that is able to think and mimic passing a test with some degree of intelligent. Although this the ultimate goal, we are not there yet, and we still have a long way to go. In the past few years, many models have been developed to learn in an unsupervised mode attempting to engage in a competitive settings against another computer or human to perform a certain task. This article shed some light on the use of Generative Adversarial Networks (GANs) and how they can be used in today’s world.

I. GANs and Machine Learning

Machine Learning has shown some power to recognize patterns such as data distribution, images, and sequence of events to solve classification and regression problems. Ian Goodfellow et al. in 2014 [1] published an article using two separate neural networks to generate synthetic data that has similar properties as the real ones. This work has made the research community more interested in generating realistic images, videos, and generic synthetic structural data.

Generative Adversarial Networks GANs: A Beginner’s Guide

Figure 1: Examples of progressively learning GAN model generating artificial human faces.

GANs are unsupervised deep learning techniques. Usually, it is implemented using two neural networks: Generator and Discriminator. These two models compete with each other in a form of a game setting. The GAN model would be trained on real data and data generated by the generator. The discriminator’s job is to determine fake from real data. The generator is a learning model, so initially, it is likely to produce low or even completely noisy data that does not reflect the real distribution or the properties of the real data.

The generator model’s primary goal is generating artificial data that can pass the discriminator successfully. The model starts taking some noise, usually Gaussian noise and produces an image formatted as a vector of pixels. The generator must learn how to trick the discriminator and win a positive classification (produced image classified as real). The generation step’s loss is computed whenever any of those generated images detected successfully as “fake”. The discriminator has to learn how to identify those fake images progressively. Negative loss is given to the discriminator whenever the model fails to recognize a fake image. The key concept is the simultaneous training of the generator and the discriminator at the same time.

Example of Generating Handwritten Digits:

The research community has many interesting datasets to measure the accuracy of a GAN model. In this article, we would use a few of those datasets in details, starting with MNIST. MNIST is one of the most significant examples of explaining the generative models’ theory used widely for image processing. A sample from the MNIST dataset is shown in Figure 2.

Generative Adversarial Networks GANs: A Beginner’s Guide

Figure 2: Sample of handwritten digit images from MNIST dataset.

To generate artificial handwritten images, we need to implement two models: one to generate fake images and another to classify fake from real ones. The overall pipleline of training a GAN model is shown in Figure 3.

Generative Adversarial Networks GANs: A Beginner’s Guide

Figure 3: The GAN learning framework, which has the generator and the discriminator simultaneously trained.

There are many architectures to consider for building the discriminator and the generator. We could build a deep neural network or Convolutional Neural Network (CNN) and some other options. We will go over the types of GAN models shortly, but first, let’s pick CNN for now.

The source code of this example is available on my Github .

The discriminator model architecture start by recieving an image (28 x 28 x 1) and pass it through two convolutional layers with 64 filters in each. Alternatively, we could use 128 filters, which represents the number of hidden nodes in each layer. We can make the neural network architecture denser by using three layers with 64, 128, and 256 hidden nodes. To simplify how GAN networks work, we will use simple architecture in this tutorial, which still gives high accuracy. Figure 4 shows the overall architecture of the discriminator.

Generative Adversarial Networks GANs: A Beginner’s Guide

Figure 4: The architecture of the discriminator model showing the number of layers and parameters in each.

The generator model learns how to generate realistic images, but it needs to start from some random points in the latent space. If you compare the generator architecture in Figure 5 with the discriminator architecture in Figure 4, you would realize they look almost identical. It is essential to know that it is not necessary to flip the discrinmiator when building the generator network. The generator’s architecture can have a different number of layers, filters, and higher overall complexity.

Generative Adversarial Networks GANs: A Beginner’s Guide

Figure 5: The architecture of the generator model showing each layer.

Another main difference between the discriminator and the generator is the use of an activation function. The discrminator uses a sigmoid in the output layer. It is a boolean classification problem, and this will ensure the output would be either 0 or 1. The generator, on the other hand, has no loss function or any optimization algorithm to be used. It uses transpose convolution layers to upsample the low-resolution dense layer from the latent space to build a higher resolution images. The trick when building the generator model is that we don’t need to compile it. The GAN model now would combine the full framework, which combines the generator, the discriminator, and compile the model. We will discuss those aspects in detail in the following section.

def building_gan(generator, discriminator):
    GAN = Sequential()
    discriminator.trainable = False
    # Adding the generator and the discriminator
    GAN.add(generator)
    GAN.add(discriminator)
    # Optimization function
    opt = tf.keras.optimizers.Adam(lr=2e-4, beta_1=0.5)
    # Compile the model 
    GAN.compile(loss='binary_crossentropy', optimizer=opt)
    return GAN

The next animation shows how the generator is improving in each set of epochs during the training:

Generative Adversarial Networks GANs: A Beginner’s Guide

Figure 5: An animated image showing the progressive quality of the generated digits using a GAN model.

GAN Capabilities and Challenges

(false trainable)

(1) Evaluation

One of the critical issues is approximating the quality of the generated data, whether it is an image, a text, or a song, and the diversity of those produced articles. The discriminator helps us to check whether the generated data is real or fake. However, the generated samples might look realistic from the discriminator point of view, but might be too obvious for the human eyes to notice. Hence, we need evaluation metrics that correlate with subjective evaluation. One way to look into this problem is by analyzing the distribution properties between the real and generated data.

Two evaluation metrics can statistically help measure the quality of the generated data: Inception Score [3] and Frechet Inception [4]. Both fo these objective metrics are widely adopted by the research community, especially for measuring the quality of produced images. Since this tutorial is an introduction, we will not detail how these metrics work.

(2) Loss Function

As we discussed earlier, the GAN model has a unique property of simultaneously training the generator and the discriminator at the same time. This requires loss functions that balance the training on one side (discriminator) while also improving the training on the other side of (generator). When building the discriminator model, we explicitly define the loss function just like any other neural network architecture.

# Defining the discriminator model 
def building_discriminator():
# The image dimensions provided as inputs
image_shape = (28, 28, 1)
disModel = Sequential()
disModel.add(Conv2D(64, 3, strides=2, input_shape=image_shape))
disModel.add(LeakyReLU())
disModel.add(Dropout(0.4))
# Second layer
disModel.add(Conv2D(64, 3, strides=2))
disModel.add(LeakyReLU())
disModel.add(Dropout(0.4))
# Flatten the output
disModel.add(Flatten())
disModel.add(Dense(1, activation='sigmoid'))
# Optimization function
opt = tf.keras.optimizers.Adam(lr=2e-4, beta_1=0.5)
# Compile the model
disModel.compile(loss='binary_crossentropy', optimizer=opt, metrics = ['accuracy'])
return disModel

The generator model, on the other hand, does not have the loss function explicitly defined. It is based on the training of the discriminator and the generator updated according to its loss function.

# Defining the generator model 
def building_generator(noise_dim):
genModel = Sequential()
genModel.add(Dense(128 * 6 * 6, input_dim=noise_dim))
genModel.add(LeakyReLU())
genModel.add(Reshape((6,6,128)))
# Second layer
genModel.add(Conv2DTranspose(128, (4,4), strides=(2,2)))
genModel.add(LeakyReLU())
# Third layer
genModel.add(Conv2DTranspose(128, (4,4), strides=(2,2)))
genModel.add(LeakyReLU())
genModel.add(Conv2D(1, (3,3), activation='sigmoid'))
return genModel

There are few options for choosing the loss functions, such as:

  • Least squares.
  • Wasserstein loss function.

(3) Determination of Convergence

One of the key issues associated with GAN models is the determination of when the model is converged. The competition between the discriminator and generator makes the game hard to reach a final winner. Both models are optimally want to maximize their gain and minimizes their loss. In our situation, we want both models to reach the point where they almost make a complete guess whether an image is fake or real and whether the generated image will pass the discriminator successfully. The 50–50 chance is the perfect ideal case inherited from the game theory, where both models are good hard to win over.

GAN models are known to have the problem of slow convergence. Similar to other unsupervised models, the absence of true labels increase the challenge for determining when the training can stop. We need to make sure to balance between the training time and the produced quality. Several factors contribute to slow or speed up the training process, such as normalization of inputs, batch normalization, gradient penalties, and training the discriminator well before training the GAN model.

(4) Produced Image Sizes

GAN models are known to have a limited capabilities when it comes to the size of the generated images. The images size that we have seen in the MNIST examples are only 28 x 28 pixels. These are pretty small images to use in a real application. If we want to generate a bigger images, let us say 1024 x 1024, we will need a more scalable model. The research community has been interested in improving GAN capabilities. For instance, in 2017, T Karras et al. propose a novel model called Progressive Growing GANs to solve such problem [2].

Types of GAN Models

Some of the challenges introduced in the previous sections made the research community expand the GAN models’ idea to tackle one or more of the issues mentioned above. This section covers some popular extensions and optimized GAN architectures to scale up the original GAN capabilities.

Generative Adversarial Networks GANs: A Beginner’s Guide

Figure 6: An overview of the types of GAN model architecture and extensions.

Deep Convolutional GAN (DCGAN):This an extension to replace the feed forward neural network with a CNN architecture proposed by A. Radford et al. [5]. The idea of using a CNN architecture and learning through filters have improved the accuracy of GAN models.

Wasserstein GAN (WGAN):WGAN is designed by M. Arjovsky et al. [6]. WGAN focuses on defining the distance between the generated distribution and the real distribution, which determines the model’s convergence. They propose the use of Earth Mover (EM) distance to approximate the differences between those distributions effectively.

Progressive GAN:ProgressiveGAN is designed by T. Karras et al. [7] and presented at ICLR conference. This work bough high contributions to the generator and discriminator to grow progressively from lower-resolution to higher-resolution layers. The technique requires reducing the size of the mini-batches while computing the mini-batch standard deviation. ProgressiveGan also uses equalized learning rate, and pixel-wise feature normalization.


以上所述就是小编给大家介绍的《Generative Adversarial Networks GANs: A Beginner’s Guide》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

MATLAB高效编程技巧与应用

MATLAB高效编程技巧与应用

吴鹏 / 北京航空航天大学 / 2010-6 / 39.00元

《MATLAB高效编程技巧与应用:25个案例分析》是作者八年MATLAB使用经验的总结,精心设计的所有案例均来自于国内各大MATLAB技术论坛网友的切身需求,其中不少案例涉及的内容和求解方法在国内现已出版的MATLAB书籍中鲜有介绍。 《MATLAB高效编程技巧与应用:25个案例分析》首先针对MATLAB新版本特有的一些编程思想、高效的编程方法、新技术进行了较为详细的讨论,在此基础上,以大量......一起来看看 《MATLAB高效编程技巧与应用》 这本书的介绍吧!

MD5 加密
MD5 加密

MD5 加密工具

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具