How to Use the Metropolitan Museum of Art’s Collection Listing API

The Met has an Open Access Policy which is that the museum makes public domain art images in their collection easily available for download. And that’s great if you want to use their website.

The MET Open Access Example
An example of the web view of an Open Access item. Source

But that’s not great if you want an automated way to get at the data and images. There are two ways to fetch the data, one is downloading the full dataset from a regularly updated Github repo and the other is a web API. The Github data notably doesn’t include references to the images. It would also require manually updating a local copy of the data at regular intervals.

But the web API exists and as far as I can tell it’s completely undocumented. The only mentions of the exact syntax I could find were in some comments on some social media posts like this.

I ended up using the API and stumbling around trying to figure it out, so I’m going to document the API for anyone else who has a similar need.

Calling the API

The API for the Met returns the results from a search, very similarly to performing a search on the normal website. It even uses the same url structure as the website. If you do a simple search through the collection on the Met website for the word “test”, you’ll get a url that looks something like this:

https://metmuseum.org/art/collection/search#!?q=test&offset=0&pageSize=0&sortBy=Relevance&sortOrder=asc&perPage=20

The part of the url after the question mark is a query string. It’s used to pass along parameters from the search. In this case it’s mostly default values except for the q=test which is the field for what we’re actually searching for.

You can play around with this url pretty easily by removing fields from the url by hand or by using the UI on the website to create a different search.

If we check the box for “Show Only: Pubic Domain Artworks” we get a new parameter, showOnly:

https://metmuseum.org/art/collection/search#!?q=test&offset=0&pageSize=0&sortBy=Relevance&sortOrder=asc&perPage=20&showOnly=openaccess

And if we click the box for “Object / Material: Musical Instruments”, we get another parameter, material:

https://metmuseum.org/art/collection/search#!?q=test&offset=0&pageSize=0&sortBy=Relevance&sortOrder=asc&perPage=20&material=Musical%20instruments&showOnly=openaccess

One search parameter that is not obvious is the offset parameter. It encodes the page selection on the search results page. That is, if you perform a search and your perPage is 20, when you’re on page one offset is 0, on page two offset is 20, three 40 and so on.

Note that none of the parameters are required. They’ll be added with default values if they’re missing. For example, you don’t need to search for any text at all. You could perform a search for any item with the Armor material.

https://metmuseum.org/art/collection/search#!?offset=0&pageSize=0&sortBy=Relevance&sortOrder=asc&perPage=20&material=Armor

So how does all this transfer over to the API? Well all you have to do is take the query string and append it onto metmuseum.org/api/collection/collectionlisting? like so:

https://metmuseum.org/api/collection/collectionlisting?q=test&offset=0&pageSize=0&sortBy=Relevance&sortOrder=asc&perPage=20&material=Musical%20instruments&showOnly=openaccess

Go ahead and try out that url in your browser. You should get a page of JSON back.

Met API JSON Response
An example of the JSON response from a call to the Met API.

At this point I’d recommend playing around with making some different calls by hand by editing the query string parameters. To better understand your results I’d recommend putting the JSON in a visualizer like https://jsonvisualizer.com/. It’ll help you see the structure of the data, the different objects and the fields they have on them.

A visualization of the JSON response from the Met API.
A visualization of the JSON response from the Met API.

At a high level, the response JSON is broken up into a request object, a results array, a facets array and some assorted simple fields on the top level object.

The request object contains the parameters used to make the request. I didn’t find this useful but if you wanted to do some validation this could help.

The results array is the main body of the data. It’s length is at most the value of the perPage parameter. It is an array of items returned by the search. It could be empty if your search had no results.

The facets array contains a lot of information about the possible values of fields on the results objects. I haven’t found a use for this.

The last piece of useful data is the totalResults field on the top level object. This field is the total number of search results for your search query. For example, a very specific query will have very few total results.

Using the Data From an Item

An element in the results array looks like this:

Example Item JSON Visualization
Example Item JSON Visualization

Most of it’s pretty self explanatory but there’s a few non-obvious bits.

The url field has to be appended to https://metmuseum.org to actually form a valid url. And you don’t necessarily need the query string for the url to work. The query string just adds a “Back to Search Results” link on the item’s page. I usually print the item url to a log after I’ve picked a random item and I found it much more readable to chop off the query string.

The image field is a valid url, but the regularImage and largeImage fields are not. To get full urls out of them, you’ll have to use the first couple characters of the largeImage or regularImage strings to find where to chop off the image string and append the largeImage or regularImage string. In the example image above, image is https://images.metmuseum.org/CRDImages/rl/mobile-large/SLP0129.jpg and largeImage is rl/web-large/SLP0129.jpg so we find where the rl is in image and substitute the ending from largeImage to get the final result of https://images.metmuseum.org/CRDImages/rl/web-large/SLP0129.jpg.

Note that the image url does NOT always have the same format. For example, you can’t rely on every url containing the phrase CRDImages. You have to use the first few characters of the large and regular image urls to find where to substitute.

Randomly Selecting an Item From a Category

At this point your usage will vary based on your own needs. I’ll explain how I used it for Appraisal Bot.

I needed to pull random public domain images from the collection to process with Appraisal Bot. I also decided that I only wanted to pull in art from hand picked categories. That way I could curate the results to be mostly items that could show up on an Antiques Roadshow style show. Unfortunately the API doesn’t provide an endpoint for randomly picking an item, so I came up with my own method.

First, we randomly pick a material to search for from a curated list that I made. Then we call the API with that material, offset of 0 and perPage of 1. We don’t actually care about results at this point, all we really want is to see the value of totalResults. That gives us the total number of items with that material.

Next, we roll a random number between 0 and totalResults. Then we call the API with the material, offset of the random number and perPage of 1. Essentially we’re making a search with 1 item per page and randomly selecting the page we’re on. That’ll return us one item in the results array. And that’s our item!

Limitations of the API

As far as I can tell, there is no way to perform a binary OR search on any search parameters. For example, I can’t search for Paintings OR Prints in the same search. This is limiting to me because of the wide range in tagging quality for materials. I’d love to be able to search for Drinking Vessels OR Vessels because they’re basically the same sort of item and not every applicable item is tagged with both. But since I can’t, in order to achieve my intended distribution, I’d have to implement a more complicated weighted random table for materials that accounts for the fact that most items are both Drinking Vessels and Vessels but not every one is.

Advertisements