Wednesday, November 28, 2012

The plural of anecdote [is | is not] data

I like stories, and many people I know like stories. You can make a story by gluing together a bunch of related observations. Polish spurious details out of a story with a little context and it can become a metaphor.

Anecdotes are stories that are either: “real and interesting” or “hearsay and unreliable”. Recently I have taken note of the term anecdote being defined both ways in the same talk. Somehow calling a story an anecdote tells you something about the story, but what it tells you depends on something external to the term “anecdote”.

The telling and understanding of stories has been a defining element of human culture possibly dating back to before the emergence of Homo sapiens. How to handle the information in a story can be a defining element of personality, and so the telling of stories stitches a person to a culture.

Anthropologists have used the collected stories of selected cultures to describe important aspects of them. We look to collected stories about happenings in our society to understand its culture, and we often call those stories anecdotes. In this freshly bedded election candidates threw anecdotes around like facts in order to persuade the listeners to invest in their understanding of our culture.

As human civilization has developed the ability to handle information and facts has matured. Though there is still a need for anecdotes we now have “data” and “theories” that themselves are parts of a self-referential context for our cultural understanding.

“The plural of anecdote is data” -- Raymond Wolfinger 1969

The idea that collected anecdotes can comprise a data set is enticingly simple. This definition places stories into a modern context. In much the same way that we can define information content of a particular datum we might be able to quantitatively asses the cultural relevance of an anecdote. This definition even puts the competing definitions of anecdote into place. There is good data and bad data just as there are real and unreliable anecdotes.

I particularly like the Wolfinger’s definition of anecdote as it bridges the divide between a modern understanding of culture and a primitive sitting-around-the-campfire-eating-partially-burnt-gazelle approach to culture. I cannot immediately picture most English majors I have known sitting around a campfire ripping chunks of crispy meat off of a freshly killed gazelle haunch , but the definition also helps to bridge an understanding of cultural relevance between humanities and the sciences. I suppose the definition could be viewed as a good will gesture between mutually misunderstanding branches of study equivalent to the “welcome to our fire” handshake still slippery with the fat of the successful hunt.

Wolfinger is a well-known political scientist who first crafted the definition as a rejoinder to a student of his who was dismissing a piece of data as “just an anecdote”.

"I said 'The plural of anecdote is data' some time in the 1969-70 academic year while teaching a graduate seminar at Stanford. The occasion was a student's dismissal of a simple factual statement -- by another student or me -- as a mere anecdote.” -- Raymond Wolfinger

Many of my readers will immediately identify Wolfinger’s definition as being the opposite of the popular saying that:

“The plural of anecdote is not data” -- Frank Kotsonis or Roger Brinner

It is interesting to note that these competing sayings may have been in informal use long before the time of reliably preserved memories of their usage. In other words there are anecdotal accounts of their use that predate the cited first uses (and the citations I have found for first use are sketchy).

Though I like Wolfinger’s definition because it is simply true the Kotsonis and Brinner’s saying is more accurate.

Data is information that can be used for a purpose. Usually this purpose is formal representation of something. By formal I mean quantitative through the application of specific measures of relevance, but in practice these can be abstracted to some informal description of a representative connection.

At best anecdotes can be representative data elements within a population that describe something similar to the “mode”or median in elementary statistics. They can also be descriptions of other statistical elements; like outliers or even unrelated elements. At worst they are stories designed to misrepresent the population from which they are pulled.

This can be well illustrated by using anecdote as an adjective. Picture a report of a “median income” figure from a state. The idea of median tells you something about the population of the state. Now imagine a report describing an “anecdotal income”. The larger the population the anecdotal figure is drawn from the less actual information it conveys about the population.

Now picture the word “random” similarly used as an adjective. A random income figure from a population says more about the population than an anecdotal figure. The plural of random information is data. Anecdotes are not randomly derived so having many of them does not comprise a data set; at least not a useful data set.

Because of the cognitive bias of the human mind an anecdote over-represents any population it is pulled from. In essence anecdotes are multiplied in the mind until the data set is re-constructed by this clonally derived plural. An anecdote is selected on the basis of some bias (even if that bias is as seemingly innocuous as “being interesting”) and multiplying the anecdote amplifies that selection bias.

“Anecdotes are not selected at random or they would just be data”

