Why Do GANs Need So Much Noise?

栏目: IT技术 · 发布时间: 4年前

Generative Adversarial Networks (GANs) are a tool for generating new, “fake” samples given a set of old, “real” samples. These samples can be practically anything: hand-drawn digits, photographs of faces, expressionist paintings, you name it. To do this, GANs learn the underlying distribution behind the original dataset. Throughout training, the generator approximates this distribution while the discriminator tells it what it got wrong, and the two alternatingly improve through an arms race. In order to draw random samples from the distribution, the generator is given random noise as input. But, have you ever wondered why GANs need random input? The common answer is “so they don’t generate the same thing every time”, and that’s true, but the answer is a bit more nuanced than that.

Random Sampling

Before we continue with GANs, let’s take a detour and consider sampling from the normal distribution. Suppose you want to do this in Python, but you never read the numpy docs and don’t know that np.random.normal() exists. Instead, all you’ve got to work with is random.random() , which produces values uniformly in the interval (0, 1).

In short, we want to transform the blue distribution into the orange distribution in figure 1. Fortunately, there is a function to do this: the inverse cumulative distribution function , also called the quantile function . The (non-inverted) cumulative distribution function, or CDF, illustrated in figure 2, describes the probability that any random value drawn from the distribution in question will be equal to or less than x , for some specified x .

For instance, at the point x=0 in figure 2, y=0.5 ; this means that 50% of the distribution lies below zero. A handy quality of the CDF is that the output ranges from 0 to 1, which is exactly the input we have available to us from the random.random() function! If we invert the CDF (flip it on its side), we get the quantile function:

This function gives us the exact relationship between the quantile (our x , ranging from 0 to 1) and the corresponding value in the normal distribution, allowing us to sample directly from the normal distribution. That is, f(random.random()) ~ N(0, 1), where each point in the input space corresponds to a unique point in the output space .

What does this have to do with GANs?

In the above scenario, we had the quantile function at our disposal, but what if we didn’t, and had to learn a mapping from the input space to the output space? That is exactly the problem that GANs aim to solve. In aprevious article, I illustrated how GANs can be used to sample from the normal distribution if you’re in a data emergency and don’t have the quantile function available to you. In this light, I find it much more helpful to think of GANs not as tools for random sampling, but as functions that map some k -dimensional latent (input) space to some p -dimensional sample (output) space, which can then be used to transform samples from the latent space to samples from the sample space. In this view, much like the quantile function, there’s no randomness involved.

With maps on the mind, let’s consider how we might draw random samples from a 2D normal distribution with only 1D random samples between 0 and 1 as input.

How would we map the 100k samples in that blue line to the 100k samples in the orange blob? There’s no good way to do it. Sure, we could use Peano curves , but then we lose the useful property of having points close together in the input space result in points close together in the output space, and vice-versa. It’s for this reason that the dimensionality of the latent space of a GAN must equal or exceed the dimensionality of its sample space. That way, the function has enough degrees of freedom to map the input to the output.

But just for fun, let’s visualize what happens when a GAN with only one-dimensional input is tasked with learning multi-dimensional distributions. The results hopefully won’t surprise you, but they are fun to watch.

2D Gaussian

Let’s start out with the issue illustrated in figure 5: mapping the 1D range between 0 and 1 to the 2D normal (or “Gaussian”) distribution. We will be using a typical vanilla GAN architecture (code available at the end of the article).

As you can see, the poor thing is at a loss for what to do. Having only one degree of freedom, it is hardly able to explore the sample space. What’s worse, because the generated samples are so densely-packed in that 1D manifold (there are as many grey dots in this gif as red dots!), the discriminator is able to slack off, never having to try hard to discern the real points from the fakes, and as such the generator doesn’t get very useful information (and certainly not enough to learn a space-filling curve, even if it had the capacity!).

Figure 6 shows the first 600 training steps. After 30k, this was the result:

It’s a cute little squiggle, but hardly a Gaussian distribution. The GAN completely failed to learn the mapping after 30k steps. For context, let’s consider how a GAN with the same architecture and training routine fares when given 2D, 3D, 10D, and 100D latent spaces to map to the above distribution:

The 2D latent space GAN is much better than the 1D GAN above, but is still nowhere near the target distribution and had several obvious kinks in it. The 3D and 10D latent spaces produced GANs with visually convincing results, and the 100D GAN produced what appears to be a Gaussian distribution with the right variance but wrong mean. But, we should keep in mind that the high-dimensional GANs are cheating in this particular problem, since the mean of many uniform distributions is approximately normally-distributed.

Eight Gaussians

The eight Guassians distribution (figure 9) is exactly as it sounds: a mixture of eight 2D Gaussians arranged in a circle about the origin, each with small enough variance that they hardly overlap, and with zero covariance. Although the sample space is 2D, a reasonable encoding of this distribution has three dimensions: the first dimension being discrete and describing the mode (numbered one through eight), and the other two describing the x and y displacement from that mode, respectively.

I trained a GAN with latent_dim=1 on the eight Gaussians distribution 600 steps, and these were the results:

As expected, the GAN struggles to learn an effective mapping. After 30k steps, this is the learned distribution:

The GAN is clearly struggling to map the 1D latent space to this 3D distribution: The right-most mode is ignored, a considerable number of samples are being generated between modes, and samples aren’t normally-distributed. For comparison, let’s consider four more GANs after 30k steps, with latent dimensions of 2, 3, 10, and 100:

It’s hard to tell which is best without actually measuring the KL divergence between the true distribution and the learned distribution (coming soon™️ in a follow-up article!), but the low-dimensional GANs seem to produce fewer samples in the negative space between modes. Even more interesting, the 2D GAN does not show mode collapse, the 3D and 10D GANs show only slight mode collapse, and the 100D GAN failed to generate samples in two of the modes.

Spiral

The spiral distribution, illustrated in figure 13, is in some ways simpler than the eight Gaussians distribution. Having only one mode (albeit elongated and twisty), the GAN isn’t forced to discretize its continuous input. It can be described efficiently with two dimensions: one describing position along the spiral, the other describing position laterally within the spiral.

I trained a GAN with latent_dim=1 for 600 steps, and these were the results:

Again, the GAN struggles to learn an effective mapping. After 30k steps, this is the learned distribution:

Similar to the case of the eight Gaussians distribution, the GAN does a poor job of mapping the spiral distribution. Two regions of the spiral are omitted and many samples are generated in the negative space. I address this inefficient mapping problem in detail inanother article, so I won’t belabour the point here; instead, let’s consider four more GANs tasked with learning this distribution after 30k steps, again with latent dimensions of 2, 3, 10, and 100:

Again, it’s hard to tell which is best without actually measuring the KL divergence, but the differences in coverage, uniformity, and amount of sampling in negative space are interesting to consider.

Closing Thoughts

It’s easy to get caught up in the GAN fervor and treat them like magic machines that use random numbers as fuel to pop out new samples. Understanding the fundamentals of how a tool works is essential to using it effectively and troubleshooting it when it breaks. With GANs, that means understanding that the generator is learning a mapping from some latent space to some sample space, and understanding how that learning unfolds. The extreme case of mapping a 1D distribution to a higher-dimensional distribution clearly illustrates how complicated this task is.

All code used in this project is available in the following GitHub repo:

ConorLazarou/medium

GANs and Low Dimensional Latent Spaces: Visualizing how GANs learn when starved for random noise.

github.com

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网