The newest metacanon update goes live today, Sept. 20th, 2016. Here’s a quick rundown of the improvements and added features available with version 0.7:
1. The website is now more consistent across browsers and has greatly improved compatibility for mobile devices and tablets. (So, basically, it’s now a lot easier to use on your phone.)
2. The database now includes preliminary collections of nineteenth and twenty-first century American fiction. Users can now generate canons limited to any span of years between 1800 and 2016. At this point, these collections are less comprehensive than the twentieth century list, but I’ll continue to add to them frequently.
3. I’ve changed the standard algorithm slightly by replacing the JSTOR data with data drawn only from JSTOR journals categorized as “Language and Literature” journals. You can still access the global JSTOR data using the custom menu.
4. I’ve also added an additional dataset drawn from the New York Times archive. Check it out here.
5. Finally, you can now create a user account, thereby unlocking two additional features. First, this allows you to save your custom canons as “presets” that can then be accessed from your user page. Second, it also allows you to catalogue which books from the collection you have read and to see statistics about your own reading habits in relation to the collection as a whole.
Ability to sort by sub-genre or ethnic group in the “Advanced” generator
A user login system that will allow users to sign up for accounts and mark books as “read” or “unread”
Visualized statistics, including user specific statistics (for example, user X has read 37% of the 100 most canonical works in the database; or, 80% of the books user X has marked as “read” are in the top 250 works in the database; etc.)
More data-points included in the “Advanced” generator, allowing for greater flexibility in generating custom lists
Beta versions of nineteenth century and twenty-first century American fiction lists, as well as the ability to view a list of American fiction from 1800 to the present
I recently added a new feature to the Metacanon statistics page that calculates the total points received by each author in the database. The result is the following list of twenty-five top-scoring authors:
William Faulkner: 177.85
Willa Cather: 112.58
Philip Roth: 111.35
Toni Morrison: 105.59
Ernest Hemingway: 95.77
Saul Bellow: 84.22
John Steinbeck: 75.06
Vladimir Nabokov: 69.93
Don DeLillo: 69.31
F. Scott Fitzgerald: 65.70
Sinclair Lewis: 64.93
Thomas Pynchon: 63.75
Alice Walker: 60.74
John Barth: 54.96
Henry James: 53.44
Cormac McCarthy: 52.41
Edith Wharton: 51.48
Kurt Vonnegut: 49.34
Theodore Dreiser: 48.54
James Baldwin: 47.12
Richard Wright: 45.83
Walker Percy: 44.79
Jack London: 41.35
Zora Neale Hurston: 41.17
Flannery O’Connor: 40.73
As you can see, Faulkner is ahead by a wide margin. This is primarily the result of decades upon decades of scholarship on Faulkner’s sizable oeuvre. Faulkner is often thought of as one of the “greatest” American writers, but as Lawrence Schwartz points out in his Creating Faulkner’s Reputation (1990), this has at least as much to do with the political context of post-WWII America as it does with any aesthetic value inherent to Faulkner’s works. Faulkner’s 177.85 score is a reflection of this critical history.
After Pierre Bordieux’s La Distinction (1979), this caveat also goes for literary canons and aesthetic value more generally. Far from merely reflecting inherent value, canons reflect a socially constructed consensus. While it is tempting to think that the objectivity of a number like 177.85 represents something more concrete than the social construction of literary value, it would be a mistake to assume so.
So what does this list mean then? While I’m willing to venture that it is a fairly accurate measurement of the “canonicity” of these authors, the catch is that here “canonicity” means only the degree to which an author has been mentioned frequently in scholarship, giving the concept an awkward circularity. The authors in the canon are canonical because they are written about a lot, i.e. because they are canonical. It’s best not to venture beyond this humble assertion. These numbers tell us something that we already knew: there are a few authors who have been talked about more than the rest. They do not tell us why.
They do, however, give us a somewhat more precise representation of this distribution of values that we call “the canon.”
Metacanon uses an algorithm to calculate the relative canonicity of each book in the database. The following is a simple explanation of how this algorithm works.
Metacanon stores six independent pieces of data for each book in the database:
1. Pulitzer Prizes and Nominations
2. National Book Awards and Nominations
3. Total citations in Google Scholar
4. Total number of results returned from a targeted JSTOR search
5. Total number of results returned from a targeted American Literary History search
6. Total number of results returned from a targeted American Literature search
The data for the two literary rewards is interpreted fairly straightforwardly. Any book that has received the Pulitzer Prize is assigned 1 point, while any book that has been a runner up for the Pulitzer Prize is assigned 0.5 points. The same interpretation is applied to the National Book Award data.
However, the data from Google Scholar, JSTOR, American Literary History, and American Literature are interpreted using more elaborate formulae. For example, each raw Google Scholar score is put through the following formula:
If X = the raw number of Google Scholar citations, and Y = the adjusted Google Scholar score, then
Y = ((2X+1)1/4-1)/.7.
This looks more complicated (at least to my humanities person eyes) that it actually is. Here is what this function looks like when it is put through an online graphing calculator:
As you can see, the higher the number of citations (corresponding here to the X axis) the gentler the slope of the function. Put plainly, the first few citations count for a lot of points, but the next few count for fewer points, and the next few count for even fewer and so on. Let’s plot a few points in order to make this a bit clearer:
A book with 1 google scholar citation receives 0.45 points.
A book with 2 google scholar citations receives 0.71 points.
A book with 5 google scholar citations receives 1.17 points.
A book with 10 google scholar citations receives 1.63 points.
A book with 15 google scholar citations receives 1.94 points.
As the total number of citations grows, each additional citation is worth just a bit less. To use a metaphor drawn from economics, the relationship between number of citations and the final adjusted score is subjected to the “law of diminishing returns.”
The same goes (more or less) for the data from JSTOR, ALH, and American Literature, whose formulas are as follows:
Y = ((2X+1)1/4-1)/1
Y = ((2X+1)1/4-1)/.4
Y = ((2X+1)1/4-1)/.6
The only difference between these and the Google Scholar formula is that each is weighted slightly differently. Individual JSTOR and Google Scholar citations count for less than ALH and American Literature citations merely because there are so many more of them overall. This helps to strike a balance between data from publications across (more or less) the entirety of academia and scholarship specifically within the field of American literary studies.
The reason I use this formula (rather than say, a linear function) is in order to keep any one data point from counting for too much, especially given the centripetal nature of literary canons. It seems better to me for works that receive a small number of citations in all four categories to score higher than works that receive a high number of citations in only a single category.
Because this algorithm is fairly arbitrary, a future version of Metacanon–probably version 0.6–will allow users to alter this formula as they choose. Eventually, I will also add more data points (perhaps from the MLA International Bibliography or from more individual journals like Callaloo and PMLA). The more data there to be manipulated, the more flexible the canon generator becomes.
I’ve added a few new features for Metacanon 0.4, including a statistics page and the ability to sort books by gender and and genre. Also, you can now click on any individual book to see a drop down menu containing a breakdown of its current score.
I have several other new features in the works that will be included in versions 0.5 and 0.6, including:
Nineteenth and Twenty-first century fiction lists
An “advanced” page that will allow users to alter the scoring algorithm to produce more flexible custom lists
a. A list of authors with more than one book on the list ordered by total number of works.
b. A graph showing a breakdown of the list by genre (novel, short story collection, etc.).
c. A graph showing a breakdown of the list by the gender of the author.
2. The ability to view a version of the list containing only works written by women.
3. The ability to view a version of the list containing only novels or only short story collections.
4. A breakdown of the score received by each work showing how many points it has received from each factor. This information will appear in a pop-up box when users hover their mouse over the name of each book.
I’m planning to add the following new features during the coming weeks for v0.3.
1. Customizable Number of entries – Users will be able to decide how many novels are listed per page.
2. Improved display menu – I will replace the current menu with a more intuitive sidebar that allows users to decide how to order the list (by rank, title, author, etc.) and allow them to cycle through the list more efficiently (by groups of 10, 25, or 100 titles).
3. An option for displaying only one novel per author – Users will be able to check a box that will produce a list that limits each author to a single entry, in effect producing an alternative list. This will give users the option to see what the canon would look like if it wasn’t oversaturated with novels by James and Faulkner.