Metacanon uses an algorithm to calculate the relative canonicity of each book in the database. The following is a simple explanation of how this algorithm works.
Metacanon stores six independent pieces of data for each book in the database:
1. Pulitzer Prizes and Nominations
2. National Book Awards and Nominations
3. Total citations in Google Scholar
4. Total number of results returned from a targeted JSTOR search
5. Total number of results returned from a targeted American Literary History search
6. Total number of results returned from a targeted American Literature search
The data for the two literary rewards is interpreted fairly straightforwardly. Any book that has received the Pulitzer Prize is assigned 1 point, while any book that has been a runner up for the Pulitzer Prize is assigned 0.5 points. The same interpretation is applied to the National Book Award data.
However, the data from Google Scholar, JSTOR, American Literary History, and American Literature are interpreted using more elaborate formulae. For example, each raw Google Scholar score is put through the following formula:
If X = the raw number of Google Scholar citations, and Y = the adjusted Google Scholar score, then
Y = ((2X+1)1/4-1)/.7.
This looks more complicated (at least to my humanities person eyes) that it actually is. Here is what this function looks like when it is put through an online graphing calculator:
As you can see, the higher the number of citations (corresponding here to the X axis) the gentler the slope of the function. Put plainly, the first few citations count for a lot of points, but the next few count for fewer points, and the next few count for even fewer and so on. Let’s plot a few points in order to make this a bit clearer:
A book with 1 google scholar citation receives 0.45 points.
A book with 2 google scholar citations receives 0.71 points.
A book with 5 google scholar citations receives 1.17 points.
A book with 10 google scholar citations receives 1.63 points.
A book with 15 google scholar citations receives 1.94 points.
As the total number of citations grows, each additional citation is worth just a bit less. To use a metaphor drawn from economics, the relationship between number of citations and the final adjusted score is subjected to the “law of diminishing returns.”
The same goes (more or less) for the data from JSTOR, ALH, and American Literature, whose formulas are as follows:
Y = ((2X+1)1/4-1)/1
Y = ((2X+1)1/4-1)/.4
Y = ((2X+1)1/4-1)/.6
The only difference between these and the Google Scholar formula is that each is weighted slightly differently. Individual JSTOR and Google Scholar citations count for less than ALH and American Literature citations merely because there are so many more of them overall. This helps to strike a balance between data from publications across (more or less) the entirety of academia and scholarship specifically within the field of American literary studies.
The reason I use this formula (rather than say, a linear function) is in order to keep any one data point from counting for too much, especially given the centripetal nature of literary canons. It seems better to me for works that receive a small number of citations in all four categories to score higher than works that receive a high number of citations in only a single category.
Because this algorithm is fairly arbitrary, a future version of Metacanon–probably version 0.6–will allow users to alter this formula as they choose. Eventually, I will also add more data points (perhaps from the MLA International Bibliography or from more individual journals like Callaloo and PMLA). The more data there to be manipulated, the more flexible the canon generator becomes.