All Alone Now - Useful Python Libraries for Pointless Websites
Over the last couple weeks, I made a little website called All Alone Now for FermiJam - a game/digital art jam inspired by Fermi's Paradox.
I consider the site a great success, and I think it struck the right note with the handful of people who looked at it:
@tripofmice What type of site is this? I don't get it.
— Jordan Samulaitis (@jsamulaitis) September 11, 2016
I'm sure, after seeing such a useful and eminently monetizable website, you would like to replicate my success. These are some of the python tools I used to make it.
Scrapy
To get the images that appear in the telescope, I scraped a real estate aggregator's listing photos of homes in Houston, TX using Scrapy.
Scrapy is a web scraping framework that makes it easy to set up a spider that extracts specific text or files out of websites. In this case, I used an image pipeline to download every image with the appropriate css class.
The code for that is all on GitHub.
Python OpenCV
OpenCV is an open source computer vision library. It's a real hassle to install, but easy to use once you finally get it running.
Image histograms
The first thing I needed computer vision for was quality control. I had over 5,000 images of empty rooms and lonely houses, but mixed in were a number of floor plans.
Floor plans look very different than photographs of rooms. They have, notably, a very different tone profile:
- vs. -
Even photos of rooms with white walls and black trim have shadows, depth, and color variety that a scanned floor plan does not. As a result, this problem is a good match for histogram comparison.
An image histogram maps the pixel intensity of an image. This isn't a very subtle way of comparing images, but for this dataset, it more or less effectively differentiates between photos and floor plans.
This article describes different types of histogram comparisons and how to implement them in OpenCV. I used correlation.
K-means clustering
In order to pull out the dominant color in an image, I used k-means clustering, a vector quantization algorithm that can be used to simplify an image to a given number of colors. For example:
From opencv.org docs
Now, the Fermi quote that appears above the telescope can appear in the dominant color of the current image.
Sorting colors
I didn't want the colors to jump around in a jarring way, so I need to sort the images by color. This is a nontrivial problem, and
Alan Zucconi has an excellent guide to a few different approaches.
I used Hilbert sorting. This is a fractal space-filling algorithm, which we can use to map the three color values (red, green, and blue) as coordinates.
Luminance
The final step in creating All Alone Now was adding the wonderfully eerie music, created by Emily E. Meo. She gave me 12 loops, so I divided the images into 12 roughly equal-sized groups in order of luminance.
Luminance is the intensity of light, and it's simple to calculate from an RGB color value.
When the current music track is complete, the next track played is the one that corresponds to the luminance of the current image displayed in the telescope.
After all this, the images are associated with their dominant RGB color, luminance, and musical track, and sorted by color similarity. All the scripts I used are, of course, on GitHub.
Is all this work noticeable? Not really.
¯\_(ツ)_/¯