A tool for Collaborating over GAN’s latent space

栏目: IT技术 · 发布时间: 4年前

内容简介:In January 2020 we finalized the development phase ofGenerative Adversarial Networks, or GAN, was the

In January 2020 we finalized the development phase of Marrow . Shirin Anlen and I are sharing lessons learned during this process, and our post about optimizing and augmenting a small dataset was recently published on towardsatascience . This post looks at how custom web-based tools can inspire a collaborative artistic workflow when working with machine learning models.

Apr 29 ·8min read

A tool for Collaborating over GAN’s latent space

Shadow animation from GAN’s latent space using the web explorer tool

Myself and Marrow

Marrow is a hands-on research project and an interactive theater experience by shirin anlen that explores the possibilities of mental disorders in machine learning . I have previously worked with Shirin on a number of projects, most notably the VR documentary Tzina: Symphony of Longing . In 2018 I joined shirin to preview Marrow as an installation at IDFA Doclab 2018 . The prototype was a success, and one year later we went as collaborators to an intensive development phase co-produced by the National Film Board of Canada and Atlas V .

About GAN and its latent space

Generative Adversarial Networks, or GAN, was the first machine learning model we decided to research . It focuses on generative visual imagery and exhibits a very clear dissonance if you attempt to train it on complex concepts using banal stock images. In a previous post we described how we created a dataset of ‘ Perfect family dinner’ images and used it to train StyleGAN V1 . This particular dataset was constructed to serve the story of the experience; one of a dysfunctional family that sees itself only through the distorted data that it was trained on. Because of this, we aimed for results that are imperfect and represent the glitches that emerge when the model tries to go deep into social narratives.

Our dataset was a bundle of around 6,500 images containing figures of four family members, stripped away from their family dinner setting. Once StyleGAN finished the training process, we ended up with a vast space of possibilities for newly generated images containing four distorted familial figures. The infinite, continuous, space of possibilities for an output image is called the Latent Space . It is “latent” because the output image generated by GAN is determined by a seemingly hidden process of mathematical transformations, starting from a series of numbers, and ending with a bitmap image. When you change any of the initial numbers in the series, the resulting image would be slightly different. The transformation network is so deep, that it’s hard to predict what would change in the image.

A tool for Collaborating over GAN’s latent space

An animation of latent space transitions

If you have a good enough dataset and algorithm, you might be able to reach disentanglement : that is when one of the input numbers controls one meaningful element in the resulting image; for example, one number would change the age of one generated person, while another changes their hair color. Needless to say, we were not able to achieve disentanglement with our small dataset. A change in a single number from the initial series could induce various changes in multiple family members. The same number could simultaneously control one family member’s pose, another member’s smile, and the appearance of a Christmas hat in a third figure (a repeating motif in stock images, it seems). The family members were in fact entangled .

The Shadow Allegory

Marrow tracks each of its models ‘thinking’ process and questions what could go wrong. In GAN, the latent space gives us information about how input data is being broken-down and then reconstructed into something new. But as much as visualizing the latent space is intriguing, we were looking for ways to integrate storytelling into experience. We wanted to materialize GAN’s distorted image of the world.

When watching the ongoing training process of GAN we started noticing things that are other than human, coming from the source dataset. It was like staring at Rorschach tests; flat images that appear different depending on who is watching. We realized that we are learning more about GAN not by seeing the result that we expect, but by seeing its in-between spaces. Plato’s Allegory of the cave speaks about finding meaning in the simple and flattened representation of things. The people in the allegory are stuck in a cave with a fire burning outside. The fire projects the shadows of passing by objects on the cave’s walls, and that is all they can see of reality. They are so used to those shadows, that once a prisoner breaks free, their eyes get burned by the flaring sun. When the prisoner’s eyes are finally accustomed to reality, they come back to the cave to tell the others, but now they are unable to see anything in the darkness. The other prisoners assume that something evil lies outside.

Interestingly, Plato’s allegory of the cave corresponds quite well with the structure and training process of GAN . GAN is in constant conflict between reality, representations of reality, and fantasy. When the algorithm generates images that are too close to the original dataset, it finds itself stuck in a simple and flat representation of the world, unable to escape to pathways of creativity. When GAN’s generations are too fantastical, they are inevitably deemed as fake and wrong. GAN is in a constant struggle to find the balance between the real and the imaginary. Therefore, we decided to visualize GAN’s struggle by using the shadow representation of the distorted family outputs.

A tool for Collaborating over GAN’s latent space

Transitions in full color VS in shadow mode

Animating over the latent space

Marrow is an interactive theater piece where the participants play the role of machine learning models in a family dinner setting. In the experience, a participant who represents GAN is telling their story about the difficulties they face in discerning memory from imagination — both of those perceptions are in fact distorted in GAN, so we decided to explore at this phase the additional layer of fantastical animated layer over the world of shadows, that would represent the character’s struggle between the real and the fake. We worked with the talented Paloma Dawkins , a master of hand-drawn animations and alternate dimensions. Now we had to ask ourselves: how do we orchestrate a workflow that starts in the mathematical depths of GAN, but ends with hand-drawn animations that perfectly match GAN’s latent movements across the image space? The answer came in the form of our custom-developed tool: Marrow GAN Explorer .


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

编程珠玑(续)(修订版)

编程珠玑(续)(修订版)

【美】Jon Bentley 乔恩•本特利 / 钱丽艳、刘田 / 人民邮电出版社 / 2015-2 / CNY 35.00

历史上最伟大的计算机科学著作之一 融深邃思想、实战技术与趣味轶事于一炉的奇书 带你真正领略计算机科学之美 多年以来,当程序员们推选出最心爱的计算机图书时,《编程珠玑》总是位于前列。正如自然界里珍珠出自细沙对牡蛎的磨砺,计算机科学大师Jon Bentley以其独有的洞察力和创造力,从磨砺程序员的实际问题中凝结出一篇篇不朽的编程“珠玑”,成为世界计算机界名刊《ACM通讯》历史上最受欢......一起来看看 《编程珠玑(续)(修订版)》 这本书的介绍吧!

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具