As you may have noted, the internet is currently exploding with images that seem to contain an inordinate number of slugs, dogs, Japanese buildings … and eyes. This is the result of Google publishing a couple of blog entries regarding their research in training neural networks to recognise image data, and what happens if you reverse the process – letting the software “daydream” about features it sees in an image. Well, it’s not really daydreaming, and the computational aspect of it is much more prosaic – it simply feels it’s way over an image and enhances features that reminds it of other stuff.
This is pretty similar to something most of us did as kids – watched the clouds and imagined them to be animals, faces, buildings in the sky. This is the original article that sparked my interest, and after reading this I knew I had to try it out.
Going Down the Rabbit Hole
Setting it up was, as it turned out, not all that easy. I started optimistically by reading a few guides online thinking that taking into account my years of computer experience, Unix/OSX skills and luck in general this would take an evening or so.
It took two days on a 2012 iMac running Yosemite. Profanity breaks included.
So let me tell you: If you’re not really computer-savvy and don’t mind reading countless guides, how-to’s, forum threads, downloading a few versions of Python and framework/libraries you never heard of, don’t even bother. I lost count on how many times I rebuilt Caffe, and I still had to manually fix some of the inconsistencies when linking the libraries with countless dependencies. (And yes, the latest version of some tools will not work, so you will have to dig around to install a previous version).
Remember – we’re talking researchers here, who basically did a brain dump of their current setup. It is not polished the least bit. At all. You also need to be able to program to make us of it, even if the example Python code does make it possible with a few changes to create the most surreal zoom experience you had so far with your favourite holiday pictures.
Things Will Change
This is just a peek through the keyhole of an incredibly exciting technology that is becoming close to be useful “to anyone”. Neural networks were all the rage in the 90’s, and I remember listening impatiently through my CS classes for improvements in the field. But it appears that we only recently have gotten the data volumes and processor speed necessary to start bringing things out of the research labs.
The picture above is created by letting the default trained network “dream” over my typewriter image for about 50 iterations – I picked out three to illustrate the progress. The network sets into a steady state fairly quickly, but it is interesting to “bump” it by adding noise or twisting the image slightly. I have to get back to that.
I imagine this will move quickly ahead now. The novelty of the original dataset will wear off (with its curious overwhelming bias for dogs and eyes) and other datasets will start to emerge, bringing new variation to the images. Also, clever programmers will incorporate this into more user friendly tools (if I was head of Adobe, I would do anything I could to get some of this technology into Photoshop filters) and as the datasets grow, the results will be more interesting and possibly less trippy. It’s also not too far-fetched to imagine this being possible in real time down the road. I used to set my computer working overnight to generate a Mandelbrot image in the 80’s (and later an animated “zoom” into the fractal) and today I can zoom and pan fractals instantly on my phone. Imagine having this technology as AR – looking at the world through a pair of “dreamy” or “enhancing” goggles. It could be scary as hell, or extremely useful, for instance filling in details that we can’t see.
The picture above was created by letting the “lower levels” of the network dream. They detect edges, contrast, structure in images. What happens when we let “higher levels” take control? That’s when we end up with pictures like the one to the right, where high-level representations of images are found and emphasised. Naked Lunch, anyone?