In the summer of 1889 Vincent van Gogh undertook a series of landscape painting. For each of the painting he sent a drawing to his brother Theo in Paris.
As can be seen, one of the drawings is missing its corresponding painting. The drawing is called “Wild vegetation” and is presumably a field full of flowers with a mountain in the background.
People noticed the missing painting and many wondered were it might be.The mystery came to a conclusion 118 years later, in 2007, when a conservator at the Museum of Fine Arts in Boston named Meta Chavannes X-rayed the painting “The Ravine” and found the remains of wild vegetation underneath.
It is known that Van Gogh was low on material throughout his life and even resorted to use towels as canvases when the urge to paint became overwhelming.
Considering this, it is not a surprise that he simply reused the wild vegetation canvas. Sadly it is hard to make out what the painting would have looked like from the X-ray, so it would seem like the painting is lost to the world. That is, until now. With the power of generative machine learning, I set out to try and reconstruct the painting.
Neural style transfer
I started out with the idea that neural style transfer might solve it.
Neural style transfer is a technique where one exploits the an interesting fact about the inner workings of deep learning.
One of the great strengths of deep learning is its ability to automatically feature select. It turns out that when training a deep neural network, on for example faces, each layer learns to recognize features of higher and higher abstraction, so that the first layers look for straight lines and edges, subsequent layers for eyes and ears, and further layers for whole faces.If we look at two images we would expect them to have similar features in the deeper layers if they have the same “content”, i.e. both contains a car or a face. This is independent of the exact appearance of the car or face.The earlier layers, by contrast, will have similar features if the images both contains similar “styles” such as colors, edginess or roundness.
It is a bit more complicated than that though, since what we call style is often a combination of several features. So for example, an image of a lawn might contain the features spikiness and green, and an image of a hedgehog in a forest contain the same features. If we are using the first image as style input, presumably what we actually want is the “grassy” style, which is a combination of the green and spiky features in the same place.
To achieve this we use the gram matrix, which tells us where in the image certain features are present together. In the grassy image, green and spikiness are present at the same place, in contrast to the hedgehog image where spikiness is correlated with brown. Thus, to achieve similar styles we try to maximize the similarity of the gram matrices.
The above example is taken from Generative Deep Learning.Then we do the back propagation algorithm, but instead of changing the weights of the network as we usually do when training, we keep them fixed and instead change the image.Using the library Neural-Style-Transfer, I did this on wild vegetation and the painting “Poppy fields” and this was the result:
The result is far from satisfying as the model seems to have a problem of discerning what is ground and what is sky, and thus makes the ground blue in certain places. This way of putting sky color on the ground is of course something van Gogh did himself in the wonderful piece “The Sower” where he reverses the color order of ground and sky, but I feel this it is probably not appropriate in the piece above, at least not in the way it is done here.I should of course have guessed that this would be the case, why would the neural style transfer know what is sky and what is ground?
My next idea was to use CycleGAN. A GAN is the name of two neural networks that compete against each other. The first network (the Generator) is trying to paint images and the second (the Discriminator) is trying to guess if the first has painted it or if it is taken from the training data. The first is trained with the goal of fooling the second and the second is trained with the goal of not being fooled.
I like the idea of becoming better together with your adversary through competition, constantly pushing each others boundaries. Sort of like how the cheetah became so utterly well designed for speed while trying to keep up with antelopes, who themselves became faster while competing with the cheetah.Vanilla type GANs can only produce images similar to a large set of training data but there is no way to direct it.
A CycleGAN however is a type of conditional GAN that takes an image as a condition and use that to create a new image.It consists of two GANs and two datasets. One GAN is making images similar to the first dataset and the other to the other dataset. We also impose the additional “Reconstruction criterion” on the GAN.
This means that if the first GAN takes an image and generate another, the second GAN should be able to take the new image and generate a image that is very similar to the original starting image.
After one night on my slow GTX 670 using the GDL_code, the result is as follows:
This is a lot better. The sky is blue and the ground is green and full of flowers. The resolution is too low though, so I tried again with higher settings.
Unfortunately the result here is actually worse. I think that one problem is the uncharacteristically small sky. Being trained on images with larger skys, it’s hard for the network to know where it is.Helping the machine along the wayNot being entirely satisfied I decided to use a hybrid method. Since it seems like the network have a hard time of knowing what the different parts of the images is supposed to be, perhaps a human-machine collaboration would be beneficial to solve this. I as a human, tells the machine what is sky and what is flowers, and the machine does what is does best and paint a picture in the correct style.
Well, that did not work out. Van Gogh and the other post-impressionists still painted from reality, although they put their own special interpretation on it. What I would need is not an image consisting of large colored areas but something more like a photo. If I could recreate the scene that Van Gogh saw, then style transfer could work.
GauGAN and Pix2Pix
Nvidia have created their own flavor of the Pix2Pix algorithm and calls it GauGAN (a pun on Gauguin, another post-impressionist). Here the GAN is trained on two images, in the case of GauGAN one is a photo and the other is the a schematic sketch of the photo that depicts its different parts such as grass, water, sky. In this way GauGAN learns to translate between schematic sketches and photos. I gave it my sketch from above.
Now this looks like a scene that Van Gogh might actually have seen. It reminds me a bit of the alps with all the flowers and herbs and the mountain in the background.Running the reconstructed picture through style transfer yields:
Finally we have something that might pass as a reconstruction of Wild Vegetation.
We are at an interesting point in time where neither machines nor humans are superior in everything. This opens up for some interesting collaborations where each party can contribute and make something that they would not have been able to do by themselves.