内容简介:Deep learning concepts and utilities to merge any style into your content image with Neural Style Transfer!Have you ever had a desire to be painted by the revolutionary painters of the past? Have you ever been amused by the intricacies and fine art, the pa
Transfer an artistic image style to any content image
Deep learning concepts and utilities to merge any style into your content image with Neural Style Transfer!
Have you ever had a desire to be painted by the revolutionary painters of the past? Have you ever been amused by the intricacies and fine art, the painters left on this world? Let’s dive deep into how we can use machine learning to reproduce our content using these impeccable styles. By the end of this article, you’ll be acquainted with various machine learning concepts and utilities that’ll enable you to transfer any style to any content image of your own choice.
Convolutional Neural Networks
Convolutional Neural Networks also referred to as CNNs are the basic building blocks required for Neural Style Transfer. These kinds of deep networks are used to detect patterns/strokes in an image and are generally used in image classification and generation. Below is the structure for a basic CNN.
CNNs implicates convolving over parts of activations in a layer, defined by the filter (kernel) size over many layers until we finally reach our output layer, which classifies our image in terms of encountered inputs. You can get acquainted with various nuances of CNNs from here .
Our main idea is to use different activations provided by intermediate layers of CNNs, on using our content and style images as inputs. We will define two separate loss functions , content loss and style loss, which will track how ‘further away’ our generated image is, from our content image and style image. The goal of our Neural network will be to minimize these loss functions and therefore improving our generated image in terms of both content and style.
Transfer Learning
Instead of training a CNN model from scratch, we’ll use a pre-trained model, which will save us all the training and computation. We’ll use a VGG-19 model, which is a 19 layered CNN. The main idea behind using a pre-trained model is, to use a model trained on a different task and apply that learning to a new task. You can use any open-source implementation of VGG19 with pre-trained weights. Pre-trained weights can be obtained from here . This is the implementation I used, courtesy deeplearning.ai .
Neural Networks model takes lot of computations and time and therefore, a GPU surely comes in handy. Be sure to check out Google Colab which gives you free access to a GPU for over 12 hours in a single session.
Neural Style Transfer
Required packages
We’ll use TensorFlow as our deep learning framework. Several other libraries are imported to enable utilities such as saving, importing, resizing, and displaying an image.
Note : nst_utils is not a library, but is the name of our python file, in our directory, in which we have our pre-trained VGG-19 model implemented and stored.
Content image and style image
We’ll use the Louvre museum in Paris as our content image and sandstone as our style image.
Normalizing data
We’ll need to preprocess our images according to the inputs desired by our VGG-19 model. The image consists of pixel values ranging from 0–255 across 3 RGB channels. Images occur in various resolutions too, depending on the number of pixels it contains.
Note: Channels in an input image remain 3 for colored i.e. RGB images, whereas 1 for grayscale images. It is the width and height that changes i.e. the number of pixels, according to resolution.
Let’s say our image has a dimension of (W, H, C), where W and H are the width and height respectively and C, the number of channels. The dimensions of input tensor, needed to be fed into our VGG-19 model must be (m,300,400,3) and normalized. variable m, also known as the batch size i s a hyper-parameter, which is set to 1 by expanding the dimensions of our image to process a single image training example, using np.expand_dims() utility. Our function returns the pre-processed image.
Content cost
The generated image should be similar in content, to our content image. For this, we choose activations from a layer to represent our content image’s activations. Earlier layers in CNNs detect low-level features such as edges and textures of an image, whereas deeper layers detect high-level features such as objects. We have chosen the last layer’s activation to represent our content’s activation. This is again a hyper-parameter , which can be fiddled around to regulate results.
After picking us a suitable intermediate layer, we set the content image as input to our VGG-19 model and obtain Ac as activations through forward propagation . We repeat the same process for the generated image and obtain Ag as activations. Note for clarity, Ac and Ag are multi-dimensional vectors. Content loss J is defined as:
Style cost
Intuitively, the style of an image can be seen as how textures and pixels of different channels change with respect to each other. Style can quantitively be defined by the Gram matrix . In this case, the Gram matrix captures the inner product of a matrix with its transpose. Style can also be identified as a correlation between activations, between different channels in a single layer. To compute correlations between different channels, we get our 3-D activation outputs with dimension (nw, nh, nc) and flatten it out to a dimension of (nh*nw, nc).
Calculating the gram matrix of this flattened version will essentially provide us with a correlation between different activations in a channel (corresponding to a row), with different channels (corresponding to a column).
After calculating the gram matrix, to define style cost, we select a style layer i.e. a layer in VGG-19 on whose activations we will base our style cost. Let ‘G’ be defined as the gram matrix of activations, for a single style layer and superscripts ‘S’ and ‘G’ denote the input image as style image and generated image respectively. Then style cost for a single layer is defined as:
Minimizing the cost above will eventually help us redefine style in our generated image.
This cost function targets only a single layer of activations. We’ll get better results if we “merge” style costs from different layers. To calculate style cost, iterate on multiple style layers (a hyperparameter ). Each layer is provided with weights that reflect how much, a single layer contributes to the style.
In this function, variable ‘weights’ can be tuned, to assign weights to the individual style layer. We have kept weights equal between all the style layers. All the weights sum to 1.
Displaying image
To display and preferably save our output image, we’ll need to revoke all the changes we did to our original image. This includes denormalizing and clipping the pixels of the output, keeping our pixels strictly in the range of 0–255.
Building our model
We have evaluated all the pre-processing and cost functions required to build our model. To build our model, we first create an interactive session with Tensorflow. An interactive session makes itself the default session, making us not call our session explicitly while running run() or eval() functions.
We then specify the paths in our directory, from where our content and style images can be fetched.
We define our final cost that our network will eventually minimize, as the weighted sum of content cost and style cost.
The weights ɑ and 훽 are again hyper-parameters, which can be tuned to assign priority to content and style in our generated image. They define the relative weightings between content and style.
Let’s now compile all our functions into our final model!
We load the content and style images from content and style path respectively and define content and style layers (hyper-parameters), required by our cost functions.
In our model, we have initialised generated image(variable ‘input_image’) as our content image. This allows our model to converge faster, as our generated image will more rapidly match and eventually converge towards our content image.
Note: Don’t simply assign input_image = content_image. This will assign input_image by reference , which will change content_image, on changes to input_image.
After loading our pre-trained VGG-19 model, we compute content cost and style cost by passing content image and style image as inputs through our CNN network respectively.
After defining the total cost, using content cost and style cost with parameters ɑ and 훽, we pass this total cost to our optimizer to minimize. An optimizer is a gradient descent algorithm that helps our model in updating weights so that our cost function converges to a minima.
In our training routine, we have used Adam as our optimizing algorithm.
We run our model for 2000 iterations ( epochs ) with a learning rate of 2. Running this model on a GPU provided by Google Colab takes around 5–10 minutes.
Using the pixels returned by our model and passing these pixels to our utility ‘display_image’ spits out our generated image as shown.
We can clearly distinguish the outlines of the Louvre Museum in a sandstone style, as we intended! Go on and try this model on with your photo in a style painted by Claude Monet!
Voila! With this, we have successfully implemented Neural Style Transfer. All that is left for you to do is find new content and style pairs and let your inner artist unfold!
以上所述就是小编给大家介绍的《Transfer an artistic image style to any image》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。