内容简介:One of my favoriteconcepts for multi-dimensional data visualization is theThe example from Wikipedia plots the differences between a few judges on some rating dimensions:
One of my favoriteconcepts for multi-dimensional data visualization is the Chernoff Face . The idea here is that for a dataset with many dependent variables, it is often difficult to immediately understand the influences one variable may have on another. However, humans are great at recognizing small differences in faces, so maybe we can leverage that!
Tired: traditional plotting
The example from Wikipedia plots the differences between a few judges on some rating dimensions:
which already improves on the "traditional" way to present such data, which is something similar to a line chart:
or a radar chart:
Clearlythe Chernoff faces are the best way to present this data, but they leave some of the gamut unexplored.
Wired: using GANs to synthesize faces
Over the past couple years, NVIDIA has worked on generating faces using the techniques from generative adversarial networks, and have published two papers on their results and improvements to their generation techniques. This is the neural network structure that is responsible for the system behind thispersondoesnotexist.com and myriad other clones such as thiswaifudoesnotexist.net , thisfursonadoesnotexist.com , thiscatdoesnotexist.com , thisrentaldoesnotexist.com , and the list goes on and on. The faces from these networks have even been used in international espionage to create fake social media profiles .
The GAN technique is pretty easily summarized. You set up two neural networks, a generator, which tries to generate realistic images, and a discriminator, which tries to distinguish between the output of the generator and a real corpus of images. By training the networks together, adding more layers, fooling around with hyperparameters and overall "the scientific process", you manage to get results where the generator network is able to fool the discriminator (and humans) with high-quality images of faces that do not actually map to anybody in the real world. StyleGAN improves on some of the basic structure here by using a novel generator architecture that stacks a bunch of layers together and ends up with an intermediate latent space that has "nice" properties.
Basically how it works.
A trained StyleGAN (1 or 2; the architecture for the dimensions does not change between versions), at the end of the day, takes a 512 element vector in the latent space "Z", then sends it through some
nonsense
fully-connected layers to form a "dlatentvector of size 18x512. The 18 here indicates how many layers there are in the generator proper—the trained networks that NVIDIA provides produce 1024x1024 images; there are 2 layers for each of the dimensions from 2^2=4 to 2^10=1024.
What part of STACK MORE LAYERS did you not get?
The upshot is that the 18x512 space has nice linearity properties that will be useful to us when we want to generate photorealistic faces that only differ on one axis of importance. The authors of the NVIDIA paper call each of the 18 layers a "style", and the observation is that copying qualities from each style gives qualitatively different results.
Putting it all together
What I did was take an unofficial re-implementation of StyleGAN2 and run it to generate 4096 random images (corresponding actually to random seeds 0 - 4095 in the original repository). The reason why we're using an unofficial implementation is that the unofficial implementation ports everything to TensorFlow 2, which also enabled me to run it with (a) less GPU RAM (I only have a GTX 1080 at home) and (b) on the CPU if needed for a future project where I serve these images dynamically.
I was also too lazy to train a network to recognize any features, so I fed these images through Kairos 's free trial, receiving API responses that roughly looked like this . (There's no code for this part, you can just do it with cURL).
Then, somewhere in this gigantic unorganized notebook that I need to clean up, I train some linear SVMs to split our sample of the latent space; eventually I will clean this up and make it so it's a little more automated than the crazy experimentation I was doing. After I train the SVMs and receive the normal vector from their separating hyperplanes, I manually explore the latent space by considering only one style's worth of elements from each of the normal vectors, plotting the results, and seeing what changes about the photos (which could be different than the feature from the API response!). This could probably be automated; I should likely use the conditional entropy of the Kairos API responses given the SVM result to rank which styles are most important and then automatically return that instead of manually pruning styles.
I ended up with seven usable properties plus one I threw out before I got tired of doing this by hand.
- yaw
- eye squint
- age
- smile
- skin tone
- gender
- hair length
- quality of photo
I decided not to use quality of photo (you can see the results in the notebook results) because I didn't want half the photos to just look terrible. The good news is that I made a new notebook for generating the actual faces, one that should actually work for you if you clone the repo.
After generating a few images and using the `montage` command from ImageMagick, we get our preliminary results!
Now that's the future!
"Favorite" might be code for "useless," going with the theme of this blog.
Clearly.
It's called a dlatent in the code, but the paper calls it the space W and the vectors w. I don't know.
for q in `seq 0 4095`; do i=$(printf "%04d" $q); curl -d '{"image": "http://cantina.patrickxia.com/faces/seed'$i'.png"}' -H "app_id: xxx" -H "app_key: xxx" -H 'store_image: "false"' -H "Content-Type: application/json" http://api.kairos.com/detect > $i.json; sleep 1; done and something similar for the `media` API, which gives you landmarks. If you care enough, I've hosted the images here, which lets you just look through them if you want.
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。