What is Metacanon?

Metacanon is an interactive canon generator. You can use this as a more flexible and inclusive alternative to the various "greatest books" lists published by entities like Modern Library. The current version of Metacanon only includes American fiction, but future versions will be expanded to include other genres, periods and national literatures. The current list of 1000 works is based on a master list of over 2500 American novels and other works of fiction.

How do I use it?

By default, Metacanon displays a list of the 500 highest scoring American novels and short story collections of the twentieth century. However, you can use the left hand menu to change the parameters of the canon in order to generate other custom lists. For example, you could create a list of the top works published between 1950 and 1959 according to citation frequency in the New York Times archive. Alternatively, you could create a list of novels written by women between 1880 and 1980.

What does it measure?

Because this list is based primarily on the number of citations that works have received in scholarly journals, it would be a mistake to presume that it measures "greatness" or aesthetic value. It would be more accurate to say that it measures "significance," i.e. the extent to which a particular work has been the object of scrutiny by scholars and awards committees. This indicates nothing more than that the works on this list have been discussed frequently. This is exponentially true of the books toward the top of the list. However, their merits as works of art are a matter of ongoing debate. Thus while many of the works on this list are likely to also appear on greatest books lists, others are not. For example, few would argue that Thomas Dixon's racist novel The Clansmen is a great work of art, but since it has been discussed frequently (if unfavorably) in literary scholarship, it currently has a score high enough to place it within the top 200 books. For better or for worse, Metacanon does not distinguish between different forms of critical significance, whether they be aesthetic, historical, political, etc.

How are the scores calculated?

In the default algorithm:

  1. Each work is given a weighted score based on the number of citations it has received in Google Scholar.
  2. A second score is given for the number of citations it has received in journals categorized as "language and literature" by JSTOR.
  3. Each work is also given a weighted score based on the number of citations it has received in two major journals of Americanist literary criticism: American Literature and American Literary History.

Each of these scores is calculated using the following simple formula:
sqrt(count) * weight
where c is the raw count of citations and w is an arbitrary weight applied to the data point. Using the square root of the raw count creates a less extreme spread between heavily cited works and less frequently cited works, and it also prevents any one data source from overwhelming the others. The default weights are as follows:
JSTOR Language and Literature: 1.0
Google Scholar: 0.5
ALH: 1.0
e American Literature: 1.0

Here is a typical example:

Beloved by Toni Morrison receives 33.11 points for having 4385 Google Scholar citations, 49.34 points for having 4385 search results on JSTOR among language and literature journals, 5.66 points for having 32 search results in American Literature, and 5.83 points for having 34 search results in American Literary History. When these points are added together, they create a combined score of 93.93. (Due to rounding, the displayed sum may be slightly different from one calculated by hand.) This places Beloved at number 1 on the standard list.

However, you can also use the custom menu on the left hand side of the home page to alter this algorithm, either by changing the weight accorded to any of these sources, or by adding in data from other sources (for example, the New York Times archive). In doing so, you will change the score each work receives as well as the general order of the list. You can then save your newly altered canon as a custom preset.

Exceptions

There are a couple of edge cases where citation counts are known to be inaccurate:

  1. When the title of the work is an extremely common word or sequence of words. In this case citation counts will be mistakenly inflated because the search will come up with a number of false positives. Stephen King's It is the most extreme example of this, given that "it" is one of the most common words in English.
  2. When an author's name is not sufficiently unique because there is another well known person sharing the same name. Citation counts for works by such authors tend to be inflated for similar reasons, although the American novelist Winston Churchill's The Crisis is the only example of this problem that we have encountered so far. Given that the British prime minister of the same name is referenced frequently and often in relation to various "crises," false positives are difficult to avoid in this case.

We don't yet have a good solution to this problem. As a stopgap measure, we currently artificially deflate scores for works that are known to fall into cases 1 and 2 described above.

For case 1, we apply the following function:
adjusted_score = raw_score * (1000 / ((f / 50) + 1000))
where f is the frequency with which the work's title occurs in the Corpora of Contemporary American English. Works that have had corpus frequency correction applied are marked with an asterisk.

Case 2 is even more problematic, given that there is no single objective measurement of "non-uniqueness" that could be applied across the board. Since this only affects one author (so far), we artificially deflate scores for Winston Churchill's works using an arbitrary function:
adjusted_score = raw_score / 100
Works that have been adjusted in this way are marked with a cross.

How does Metacanon define "fiction"?

For the purposes of this project "fiction" refers to novels, collections of short stories, novellas, and any other work of book-length prose commonly designated as fiction. Individual short stories are not included.

Metacanon also leaves out memoirs, autobiographies, and other works of creative nonfiction. Since the boundary between "fiction" and "nonfiction" will always be blurry, this is a somewhat arbitrary decision. At least some definitions of fiction would include books like Let Us Now Praise Famous Men, and The Woman Warrior, both of which are excluded here. Equally problematic is the genre of the "nonfiction novel," often used to categorize books like In Cold Blood and The Executioner's Song.

For now, I've decided (again somewhat arbitrarily) to leave all of these works off of the list. This is only because it would be a lot harder to be sure to include every nonfiction-but-actually-sort-of-like-fiction book written by an American (at least for now) than to just narrow the definition of "fiction" so that it excludes these borderline cases.

In a later version of Metacanon, users will be able to decide whether they want to see a list of 20th century prose literature (including Woman Warrior etc.), or just "20th century fiction." This still won't get rid of the problem of working with some pretty contestable categories, but it will at least allow for more flexibility.

How does Metacanon define "American"?

"American" is arguably an even more problematic category than "fiction," as laden as it is with heavy nationalist and imperialist baggage. How do we decide if a novel is "American"? Does the author have to be American? If so, how do we actually know who counts as "American?" Do they have to have been born in the United States? Or is having lived in the U.S. for a certain period of time enough? If so, how long? What about works written by authors living in the U.S. territory of Puerto Rico? What if those works are written in Spanish? What about works written in Canada or Mexico?

Rather than try to solve this problem, Metacanon merely replicates the common practices of scholars in the field of American literary studies. Generally speaking, works of fiction are usually called "American" if they are written by someone who was born in the United States (usually including Puerto Rico) or if they are written by an immigrant to the United States. Following this loose consensus (and it is quite loose), the novels of William Faulkner and Toni Morrison are "American," just as are the novels of Henry James (even the ones he wrote after crossing the Atlantic), as are the novels that Vladimir Nabokov wrote in the United States (but not his early works). While being written in English isn't a strict qualification, most of the works on this list meet it.

There is a certain imperialism to this definition. Americaness is constantly being exported around the globe by expatriates even as the works of any author who comes to the United States are immediately claimed as products of American culture. The writings of Canadians, Central Americans, and South Americans are generally excluded—unless, of course, these authors happen to spend a significant amount of time in the United States, at which point their works are almost magically transmuted into products of the American experience.

This list doesn't explicitly challenge this definition—as problematic as it is. However, I hope that having a list like this does at least make its contours slightly more legible.

Future versions of Metacanon will allow users to customize this definition dynamically, for example, by including only novels actually written in the United States (sorry James) or by including novels written anywhere on the North American continent or anywhere in the Americas more generally. In addition, users will have the option to build custom lists based on criteria other than national belonging.

Who is behind this project?

Metacanon is developed and maintained by Nathaniel Conroy.

What if I notice something that's missing?

If you think there's a book that should be included here but isn't, you can contact me by sending an email to nathanielaconroy@gmail.com or by tweeting me @ConroyNathaniel.