John Resig: “Implementing Image Similarity Matching: Comparing Open Source and Commercial Solutions”


– Thank you everyone. So, I’m John Resig, I’m a independent researcher. I wanted to talk through a number of different projects
that I’ve been working on. All relating to art history
and computer vision. A lot of it tied to The Frick here, and my work with them. I just wanna step back
and talk a little bit about how I came to this area. My day job is very different, I’m a computer programmer. I work at Khan Academy. But there, I’m not doing
any computer vision stuff or any art history stuff. This is all my side passion, hobby. It all came through my
love of Japanese prints. This is an art form that
I’ve loved for years. There was one thing I struggled with, which is that identifying the prints and understanding them’s
actually really challenging. You have to know Japanese, you have to know classical Japanese. You have to know all these sort of things. One thing I just wanted, more than anything else, was to be able to take
a picture of a print and be able to find other
copies of that print in institutions around the world. Because, there are prints, there are multiples of them, they exist somewhere. This is what I set out to build, and I did this back in 2011 and 12. Let me skip back a
second real quick, sorry. I built this database called Ukiyo-e.org. And it’s a database of currently about 220,000 Japanese prints from about 20 some odd
institutions around the world. This both including public, universities, museums, and such. And private dealers and auction houses. When I was building this, I guess one of the things
I should state, as well, is that I’m not really
doing any original research. Unlike the very awesome
people from universities are doing incredible new research, I don’t have time for that, so I’m just taking things off the shelf and like can I make this work? I think with that caveat in here. There are a couple reasons, what I’m gonna talk about today are different technologies that I tried to use for my various projects, and whether or not they succeeded. One of the technologies I ended up using for this particular
goal was doing a simple image similarity analysis. A couple of speakers have
already talked about this, but this works really well for flat images that are somewhere within another image. At the time, again, this is 2011, 2012. The only open source thing that existed was this thing called imgSeek, where you could find
things that looked similar. In practice, however, I found that it just did
not work very well at all. That it was never able to find
things within another image. It was looking at the complete image to the complete image and
trying to find that similarity. So, like this case looks pretty great. It’s all finding buses. But if that bus was a tiny
part of another image, it would never find it. I had to discard that immediately. It was around this time, that I reached out to TinEye and they have a commercial
service called MatchEngine. I told them about the
project that I wanted to do and they let me use their service. So, this is a commercial service, but it’s really, really good. In that it was able to
find image fragments and things like that. This is what I ended
up using for this site. This shows some of the
examples here is that, for example, this is a print
by Hokusai, a waterfall. And you have many different copies in different institutions, but you can tell the quality
of the image is different, some are in black and white, some are surrounded by color
bars and all sorts of stuff. But it’s still able to find the matches. I guess, one thing that
I’m emphasizing to this is that it’s not using
any metadata whatsoever. Some of these images are Japanese, some are European, using
different languages. But, all it cares about is the images and grouping them together. It also, this is an example here. This is a case where an
institution actually put the images of this diptych backwards and there it is correctly down below. Just to show where
MatchEngine didn’t quite work as I expected it would, where it’s actually in this case matching the color bars,
rather than the image itself. That’s because probably the
image is relatively delicate, there isn’t as much detail in it. Whereas, the color bars have a lot of detail and a lot of color. It was around this time that the Frick Art Reference
Library reached out to me and they were interested in doing some computer vision
analysis in their collection. They had seen my website. As I described earlier, in an art historical archive you end up with these photographs of art. Any of the Frick has
over a million of them. These all have the
descriptions and things, so this seemed like an opportunity to expand what I was doing. I started to do this similar analysis. In this case, also using MatchEngine. I guess one thing before
we’re going into it, was we weren’t sure is that I’d only been analyzing Japanese prints, which are graphic and very hard lines and all that sort of stuff, would this work for photographs of art, where it’s all fuzzy. It’s actually seemed to work pretty well. We set about analyzing the collection of anonymous Italian art. So, these are all
photographs of paintings, and Fresco’s and things like that. But they don’t have
really any good metadata because the artist isn’t known, nor is the title known. It’s all just anonymous
artist, 14th century, anonymous artist, 15th century. And then you’re just like
there’s really no way to match them together using metadata. This is where I started
to do this analysis and you’re able to start
to cluster things together. And you see that, yes in fact, there are multiple photos
of the same artwork. And sometimes, able to
fill in missing gaps. Just wanted to show a
few of these examples. Also able to find fragments. So, taking a photograph
of a larger Fresco, or something like that. And handling cases where
black and white versus color. Before and after conservation. Then, additionally, you
start getting into copies. Cases where you have the same work, but obviously it’s changed. There’s a different
version, a later version. Then you get into many different copies. Again, these are all obviously influenced by the same work here,
but they’re different, all different unique works. Stemming from that work, we started to look at what happened. So, this is one archive, and actually one part of an archive. What happens if we look at many art history photo archives? This is where the PHAROS
project came from. Which was a consortium of all these different
art history photo archives that are all around the world. And their goal was to
combine their resources, all their images, into a single place, so that way you can start to
find these new connections. I started to work with
them on this project. And I built this database called
The Pharos Images Project. This is a database of
currently about 97,000 images, representing about 60,000 artworks, from a number of different institutions. I think about six institutions, right now. What we decided to do at the start is just to focus on Italian art images. Just to have some common
basis that we can work with. But, this works very similarly
to my Japanese print site. In that, you’re able to
browse through the images, and search for them, and things like that. The one major distinction here
is that when I built this, I used an open source
image analysis solution, one called, I think it’s pronounced, I’m calling it Pastec. I don’t know what other people call it, but that’s what I call it. This is an open source. I built this much later, released this in 2016. By that point, open source solutions had started to catch up
with commercial ones. This is one that I’ve been using. I’ve also contributed to it a little bit, fixing some bugs and adding some features. This one is actually pretty good. It holds up pretty well to MatchEngine. This is to show the database you’re able to browse through all
the different artworks. With search by the name
of a particular artist. But then you can get
into the image analysis starting to group these
different records together into a single source. So, you can start to see that
this is the same painting, by the same artist, but these are two different records, at two different institutions,
brought together. Then you start to get into
some different ones here. This is probably copies of an original but you have some anonymous copies that have been cataloged
at different institutions and then combined together. You end up with a whole bunch of these. What was discussed earlier, about finding surprising cases
of metadata not matching. This comes up a lot. And again, it’s no fault
of the institutions who are doing the cataloging. It’s a needle in a haystack problem, you can’t possibly know
what some researcher stuck in a random drawer 40 years ago. It’s the sort of things that computers are uniquely designed to help with. I think this is one of the
things I’ve been trying to do a lot of, if I don’t wanna replace researchers, I wanna give them tools to help them. I think this is one, I just wanna show a case of an example where Pastec doesn’t really work quite as well as I’d hoped. It’s started to pull in other things that it’s obviously not the
same painting anymore. But you might argue it’s
stylistically similar, but I don’t know. I’ve started to experiment with
a couple other technologies. And I just wanted to show my
initial results with them. I’ve started to use the
commercial API, Clarifai. I’ve been struggling to use, to find, to see that it works well. Just to show an example, this is with some American art, paintings that were
cataloged here in the Frick. This is a case where there
was this same exact piece, two different photographs. It was able to find
that they were similar, but the thing is it’s mixed in with a whole bunch of other stuff. It is finding the details,
these are all trees, and there’s water and stuff like that. Then here’s another case
where there’s some buffalo, and it’s not finding the connection. It’s starting to group
it again with other water and trees and stuff. So, it’s not, I don’t feel
like it’s quite there yet. Or maybe I need to
configure it differently. I just wanted to give
an example of tagging. I know there’s some discussion
on tagging later on. Sometimes, the tagging’s straight on, it’s like, oh yeah, trees,
landscape, fog, nature, yeah, this is pretty
much like this painting. But then you end up with retro, vintage antique, painting, picture frame. And that doesn’t help me at all. None of that is just really describing what is the contents of that. I’ve also been experimenting with writing my own images similarity
analysis using TensorFlow. It’s still very early on, I just wanted to show some of the clusters that have been generated so far. Again, these are Japanese prints, being that these are all like very similar Utigawa school, 1850s or
so, clustered together. And this is all beautiful
women, probably 1840s. Then you have no actors,
this is 1890s, 1900s. Then finally, they’re much more graphic. It definitely seems to be promising. This was something that I’m exploring. I just wanted to wrap up there and provide the links
to the ukiyo-e database, the pharos images database. I think one of the things that’s important is that everything that I’ve talked about today is open source. I’ve written all of it, it’s freely available. You can use it for your things. A lot of my work,
especially with the Frick and the Pharos is funded
by the Crest Foundation. So, I’m very happy to give this back to the community at large. So, you can find it up
in my github, as well. And I provide some
research that I’ve written up on my website. Please use it for your things, and just let me know if
you have any questions. But yeah, thank you. (audience clapping)

Leave a Reply

Your email address will not be published. Required fields are marked *