Deep Learning Applications : Neural Style Transfer

8 min readNov 28, 2020

One of the most exciting application of Deep Learning is Neural Style Transfer. Through this article, we will understand Neural Style Transfer and implement our own Neural Style Transfer Algorithm using pre-trained Convnet Deep Learning Model.

Let’s get started and understand what neural style transfer is!!.
Neural style transfer is an optimization technique used to take two images, a content image & a style reference image (such as an artwork by a famous painter) and blend them together so the output image looks like the content image, but “painted” in the style of the style reference image. In neural style transfer terminology, there are 3 images. The image which needs to be painted is known as Content image. The style in which the content image will be drawn is Style image. And there will be one output image generated by the combination of these two content and style images and this technique used here is known as Neural Style Transfer. Neural Style Transfer allows us to generate new image like one in the below by combining content image and the style image. In other words, we can say that one image (Content image) is drawn in the style of the another image(Style image) to produce new image.

Here we generated new image in the right by combining the content of Mona Lisa image in the left and style image in the middle using Neural Style Transfer Algorithm.

Transfer Learning

Neural Style Transfer (NST) uses a previously trained convolutional network, and builds on top of that. The idea of using a network trained on a different task and applying it to a new task is called transfer learning.

Following the original NST paper, we will use the VGG network. Specifically, we’ll use VGG-19, a 19-layer version of the VGG network. This model has already been trained on the very large ImageNet database, and thus has learned to recognize a variety of low level features (at the shallower layers) and high level features (at the deeper layers).

Neural Style Transfer

To see how well our algorithm is generating the output image from the content image drawn in the style of style image, We will build the Neural Style Transfer (NST) algorithm in three steps:

Build the content cost function, Jcontent(C,G)
Build the style cost function, Jstyle(S,G)
Put it together to get total cost function
J(G)=α Jcontent(C,G)+β Jstyle(S,G)
where α and β are hyperparameters.

Cost Function

We have a content image C, given a style image S and our goal is to generate a new image G. In order to implement neural style transfer, we will define a cost function J(G ) that measures how well our algorithm is producing output image and we’ll use gradient descent to minimize J (G ) in order to get desired output.

This cost function function will have two components.
1. The first component is called the content cost. This is a function of the content image and of the generated image and it measures how similar is the contents of the generated image to the content of the content image C.
2. The second component is style cost which is a function of S,G and it measures how similar is the style of the image G to the style of the image S.

The overall cost function is defined as follows:-

Here α and β are hyperparameters to specify the relative weighting between the content cost and the style cost.

The algorithm runs as follows:-

Initialize the generated image G randomly say 100*100 * 3 or 500*500 *3 or whatever dimension we want it to be.

2. Use gradient descent to minimize cost function defined above and we will update G as:-

Here we are actually updating the pixel values of this image G. As we run gradient descent, we minimize the cost function J(G) slowly through the pixel value so then we get slowly an image that looks more and more like our content image rendered in the style of our style image.

Computing the content cost( Jcontent(C,G))

Through this content cost function, we will determine how the generated image is similar to content image. We would like to make the “generated” image G to have similar content as the input image C.

It is advised to choose a layer in the middle of the network — neither too shallow nor too deep as shallow layers tend to detect lower-level features such as edges and simple textures and deep layers tend to detect higher-level features such as more complex textures as well as object classes. We will find activation for both content image C and generated image G by setting both images one by one as the input to the pretrained VGG network, and run forward propagation. The contest cost function is defined as follows:-

Here nH, nW and nC are the height, width and number of channels of the hidden layer we have chosen, and appear in a normalization term in the cost.
a(C) and a(G) are the 3D volumes corresponding to a hidden layer’s activations.

Computing the style cost( Jstyle(S,G))

First we will know what is meant by style here. Style can be defined as the correlation between activations across different channels in the layer L activation.
Before moving to calculate style cost, we need to understand one term Gram matrix. We also call it style matrix.

In linear algebra, the Gram matrix G of a set of vectors (v1,…,vn) is the matrix of dot products, whose entries are Gij=np.dot(vi,vj).
In other words, Gij compares how similar vi is to vj: If they are highly similar, you would expect them to have a large dot product, and thus for Gij to be large.

In Neural Style Transfer (NST), you can compute the Style matrix by multiplying the “unrolled” filter matrix with its transpose.

The result is a matrix of dimension (nC,nC) where nC is the number of filters (channels). The value G(gram)i,j measures how similar the activations of filter i are to the activations of filter j.

Now we have understand gram matrix and got to know that the style of an image can be represented using the Gram matrix of a hidden layer’s activations . Our goal will be to minimize the distance between the Gram matrix of the “style” image S and the gram matrix of the “generated” image G.
The corresponding style cost for a single layer l is defined as:

G gram(S) : Gram matrix of the “style” image.
G gram(G) : Gram matrix of the “generated” image.

Remember, this cost is computed using the activations for a single particular hidden layer in the network.

We get even better results by combining this representation from multiple different layers.
This is in contrast to the content representation, where usually using just a single hidden layer is sufficient.
Minimizing the style cost will cause the image G to follow the style of the image S.

Optimizing Total Cost Function

Now we have calculated both content cost function and style cost function, our goal will be to optimize total cost function using gradient descent so that the generated image created from content image but drawn in the style of the image of the style image.

Implementation

Now we have understand the concepts of Neural Style Transfer Algorithm. Finally, let’s put everything together to implement Neural Style Transfer using TensorFlow Deep Learning Framework and pretrained VGG-19 model.

Here’s what the program is doing: