Text analysis is cool. Image analysis is cooler. And given that every year our digital world becomes more and more visual, images are becoming even cooler!
This morning, I decided to play around with Google’s open-source Vision Transformer (ViT) model. I wanted to see if “it” could “learn” patterns in some old data that I had lying around. These data were hundreds of avatars from the notorious white nationalist forum, Stormfront.
Google’s model is pre-trained on 14 million images at resolution 224x224 and fine-tuned on the ImageNet 2012 dataset, which is 1 million images at resolution 224x224. What does this mean? Mostly, given a cluster of information about a set of pixels, the model is pretty good at predicting what the next cluster of pixels might look like. A dumb example might be that, given two dots and a nose, the model could predict that a mouth is somewhere beneath.
What “fine-tuning” means is that I fed this “pre-trained” model a few more visual data points of interest to whet its appetite and let it know what I’m interested in—my avatars from Stormfront. I then asked the model to perform a “cluster analysis,” i.e., use your statistical models to group these images based on similarity. You can ask it to make 2, 3, or 100 groups. People often call the number of clusters “K.” After trying a few options, it seemed to me that 3 groups told the best story.
The above image tells the story of three separate styles of avatars on within the White Nationalist community on Stormfront. In the top left (yellow), we see folks who express their identity through symbols. In the bottom right (blue), we see those who select nature-y images that also seem to contain a disproportionate amount of blue. In the top right corner (pink), we see a very distinct cluster of portraits, the most distinct of which are black and white. I think there is a case to be made that, in this basic analysis, the algorithm might be disproportionately categorizing based on color. If I were to push this analysis further, perhaps I would convert all the images to black and white and rerun the same code.
The estuary-like middle ground between the three poles is also interesting. This is the zone where the machine model has a hard time categorizing the images. Here we find a lot of neo-classical imagery of Greek busts and shit, that is, in some ways, a combination of our three categories: symbol, portrait, and nature. Also, my favorite little thing here is the repeated image in the bottom right part of the middle ground of cosmic white nationalist William Pierce with a cat on his shoulder. He’s grouped closely with Dorothy from The Wizard of Oz.
Anyway, we might think here of avatars as a mundane version of Goffman’s “presentation of self in everyday life.” Does much thought go into these image choices by white nationalists? Maybe not. Is it still interesting? Maybe.