Is William Faulkner the Most “Canonical” American Novelist of the Twentieth Century?

faulknerAccording to the data, the answer is yes; but canonical status shouldn’t be confused with aesthetic value.

Metacanon measures the “canonicity” of individual works of literature by counting the number of times they’re mentioned in scholarly journals and converting this data into a uniform score for each work. Right now, it only covers twentieth century American fiction, but I’m working on expanding it.

I recently added a new feature to the Metacanon statistics page that calculates the total points received by each author in the database. The result is the following list of twenty-five top-scoring authors:

William Faulkner: 177.85
Willa Cather: 112.58
Philip Roth: 111.35
Toni Morrison: 105.59
Ernest Hemingway: 95.77
Saul Bellow: 84.22
John Steinbeck: 75.06
Vladimir Nabokov: 69.93
Don DeLillo: 69.31
F. Scott Fitzgerald: 65.70
Sinclair Lewis: 64.93
Thomas Pynchon: 63.75
Alice Walker: 60.74
John Barth: 54.96
Henry James: 53.44
Cormac McCarthy: 52.41
Edith Wharton: 51.48
Kurt Vonnegut: 49.34
Theodore Dreiser: 48.54
James Baldwin: 47.12
Richard Wright: 45.83
Walker Percy: 44.79
Jack London: 41.35
Zora Neale Hurston: 41.17
Flannery O’Connor: 40.73

As you can see, Faulkner is ahead by a wide margin. This is primarily the result of decades upon decades of scholarship on Faulkner’s sizable oeuvre. Faulkner is often thought of as one of the “greatest” American writers, but as Lawrence Schwartz points out in his Creating Faulkner’s Reputation (1990), this has at least as much to do with the political context of post-WWII America as it does with any aesthetic value inherent to Faulkner’s works. Faulkner’s 177.85 score is a reflection of this critical history.

After Pierre Bordieux’s La Distinction (1979), this caveat also goes for literary canons and aesthetic value more generally. Far from merely reflecting inherent value, canons reflect a socially constructed consensus. While it is tempting to think that the objectivity of a number like 177.85 represents something more concrete than the social construction of literary value, it would be a mistake to assume so.

So what does this list mean then? While I’m willing to venture that it is a fairly accurate measurement of the “canonicity” of these authors, the catch is that here “canonicity” means only the degree to which an author has been mentioned frequently in scholarship, giving the concept an awkward circularity. The authors in the canon are canonical because they are written about a lot, i.e. because they are canonical. It’s best not to venture beyond this humble assertion. These numbers tell us something that we already knew: there are a few authors who have been talked about more than the rest. They do not tell us why.

They do, however, give us a somewhat more precise representation of this distribution of values that we call “the canon.”

The Metacanon Algorithm Explained

Metacanon uses an algorithm to calculate the relative canonicity of each book in the database. The following is a simple explanation of how this algorithm works.

Metacanon stores six independent pieces of data for each book in the database:

1. Pulitzer Prizes and Nominations
2. National Book Awards and Nominations
3. Total citations in Google Scholar
4. Total number of results returned from a targeted JSTOR search
5. Total number of results returned from a targeted American Literary History search
6. Total number of results returned from a targeted American Literature search

The data for the two literary rewards is interpreted fairly straightforwardly. Any book that has received the Pulitzer Prize is assigned 1 point, while any book that has been a runner up for the Pulitzer Prize is assigned 0.5 points. The same interpretation is applied to the National Book Award data.

However, the data from Google Scholar, JSTOR, American Literary History, and American Literature are interpreted using more elaborate formulae. For example, each raw Google Scholar score is put through the following formula:

If X = the raw number of Google Scholar citations, and Y = the adjusted Google Scholar score, then

Y = ((2X+1)1/4-1)/.7.

This looks more complicated (at least to my humanities person eyes) that it actually is. Here is what this function looks like when it is put through an online graphing calculator:

googlescholarformula
X axis = total number of citations. Y axis = total adjusted score.

As you can see, the higher the number of citations (corresponding here to the X axis) the gentler the slope of the function. Put plainly, the first few citations count for a lot of points, but the next few count for fewer points, and the next few count for even fewer and so on. Let’s plot a few points in order to make this a bit clearer:

A book with 1 google scholar citation receives 0.45 points.
A book with 2 google scholar citations receives 0.71 points.
A book with 5 google scholar citations receives 1.17 points.
A book with 10 google scholar citations receives 1.63 points.
A book with 15 google scholar citations receives 1.94 points.

As the total number of citations grows, each additional citation is worth just a bit less. To use a metaphor drawn from economics, the relationship between number of citations and the final adjusted score is subjected to the “law of diminishing returns.”

The same goes (more or less) for the data from JSTOR, ALH, and American Literature, whose formulas are as follows:

JSTOR:
Y = ((2X+1)1/4-1)/1

ALH:
Y = ((2X+1)1/4-1)/.4

American Literature:
Y = ((2X+1)1/4-1)/.6

The only difference between these and the Google Scholar formula is that each is weighted slightly differently. Individual JSTOR and Google Scholar citations count for less than ALH and American Literature citations merely because there are so many more of them overall. This helps to strike a balance between data from publications across (more or less) the entirety of academia and scholarship specifically within the field of American literary studies.

The reason I use this formula (rather than say, a linear function) is in order to keep any one data point from counting for too much, especially given the centripetal nature of literary canons. It seems better to me for works that receive a small number of citations in all four categories to score higher than works that receive a high number of citations in only a single category.

Because this algorithm is fairly arbitrary, a future version of Metacanon–probably version 0.6–will allow users to alter this formula as they choose. Eventually, I will also add more data points (perhaps from the MLA International Bibliography or from more individual journals like Callaloo and PMLA). The more data there to be manipulated, the more flexible the canon generator becomes.

Metacanon 0.4 is now live!

I’ve added a few new features for Metacanon 0.4, including a statistics page and the ability to sort books by gender and and genre. Also, you can now click on any individual book to see a drop down menu containing a breakdown of its current score.

I have several other new features in the works that will be included in versions 0.5 and 0.6, including:

  1. Nineteenth and Twenty-first century fiction lists
  2. An “advanced” page that will allow users to alter the scoring algorithm to produce more flexible custom lists
  3. Individual author pages
  4. Search capability

Stay tuned!