After the two very inspirational talks on integrating sounds into technologies as well as inter-medial argument-structures, and after a quick coffee-to-go, we dive right into an essential question in the humanities: Todays second round of long paper presentations by Mike Kestemont, Maciej Eder and Joseph Rudman introduce us to the many problems plus three particular solution-approaches to authorship attribution.
The first speaker, Mike Kestemont, choses an approach similar to the work by Koppel et al. pointing out structural and topical distinctives in texts by one specific author. His analysis shows: texts by the same author show few structural differences, whereas texts by different authors have many distinctives. So far, so good. What noone in the room would’ve come up with: his interesting application of these findings: for instance, we could statistically analyse suicide letters to identify fakes written by murderers. The problem with that, according to Kestemont: „One text is probably the only suicide letter a person has ever written.“ Followed by laughter from the audience. – What seems hilarious leads us to an actual problem: Is author-identification possible cross-genre?
„What about cross-language?“, one of the attendants wonders. With the given structural features, it might be hard to compare novels in English and French by the same author. Speaking of language differences – very juicy becomes a discussion on the nature of „genre“: Can „theatre“, with its wide range of variety, be called a „genre“? Some of the conference-participants disagree. In German, however, there are two sorts of classifications: „Gattung“ (meaning class/ category/…), which is more broad, and „Genre“, which is more specific.
The next talk by Maciej Eder is more about potential error sources researchers have to deal with in authorship attribution: Errors with corpus-work can occur if either the corpuses are untidily prepared, or because of the text-inherent problems: His experiment shows, how contaminated texts (with wording- or spelling-errors), have an influence on reearch-results in different language. As Eder goes into detailed graphs showing the impact in different languages, the audience seems to quite enjoy themselves. I have a hard time comprehending all, but I do understand what’s so funny about it. „Polish – boring again“, Eder states (like he expected, the more contaminated the corpus is, the worse the results) – laughter from the attendants. Then, a surprise; the Latin graph: „This is the picture that made me apply for the conference: In the Latin language, even 40% of all words total replaced do not have an impact on the results!“
Last but not least, Joseph Rudmans lecture on “The Twelve Disputed Federalist Papers“ gives us a good practical example of the use of statistical author authentifcation and how collaboration and non-traditional research can contribute to detect authorship clearly. Since several of the papers were attributed to the wrong authors, every text must be unedited, de-edited and edited. This includes removing quotes and anything else that changes the authors original style, in order to be able to detect the author of the essay.
Kestemont’s, Eder’s and Rudman’s paper presentations have made it clear, that, however many problems there still are to be dealt with, it is possible to use computers to attribute texts from anonymous sources to certain authors – is it yet too soon to call out the „Rebirth of the Author“?