The Early Novels Database: a Case Study

by Rachel Sagner Buurma, Assistant Professor of English Literature, Swarthmore College,  Anna Tione Levine, junior Honors English major, Swarthmore College, and Richard Li, senior Honors English major, Swarthmore College.

(Originally Posted April 30th, 2011)

Project description1

The Early Novels Database (END) is a bibliographic database based on the University of Pennsylvania’s Rare Book & Manuscript Library’s extensive collection of fiction in English published between 1660 and 1830. Produced by the collaborative effort of Penn librarians, information technology specialists, faculty from Swarthmore College and Penn, and Swarthmore College undergraduate researchers, the completed database will include richly descriptive records of more than 3,000 novels and fictional narratives, from the very canonical to the almost unknown, from fictions that clearly announce themselves to be novels to the works of fiction (fable, travel narrative, romance) that formed part of that genre’s notoriously murky origins. Users will be able to perform both keyword and faceted searches across bibliographic records containing both edition-specific and copy-specific information about each novel. END seeks to unite twenty-first-century search technologies and twentieth-century descriptive bibliography with the sensibility of eighteenth-century indexing practices in ways that enable researchers to write new histories of the novel.

We have designed END to complement the extensive existing full-text facsimile archives that contain early novels (such as ECCO, GoogleBook, the Internet Archive, and HathiTrust, to name a few). One of the major problems with recent large-scale book digitization projects has been the loss of edition-specific and copy-specific structured metadata–of information about and describing the book–of the kind often available in library card catalogs. The absence of this data can make it difficult for scholars and other researchers to find particular novels or sets of novels they are interested in, because even as our archive of digital texts from the seventeenth, eighteenth, and nineteenth centuries has expanded exponentially, our ability to access them in precise, controlled, and complex ways has diminished. While recent projects have begun to take on this challenge–Brian Geiger’s (University of California, Riverside ) and Ben Pauley’s (Eastern Connecticut State University) Google-sponsored effort to automatically match ESTC (English Short Title Catalog) records to GoogleBook items is a notable recent example–our project seeks to use human eyes and brains and hands to create and control bibliographic descriptions in ways that computers cannot. For example, we tag each noun, adjective, person name, place name, and object mentioned in the title of each novel; the resulting information can be keyword searched but also appears as a set of “facets” that display how often a given word in each category appears. Therefore, researchers can not only perform traditional keyword searches of the title field to turn up relevant items, but can also see the entire array of nouns appearing on all title pages sorted alphabetically or by frequency. We also include in-depth information on other aspects of the novel’s paratexts, describing the prefaces, introductions, dedications, indexes, tables of contents, copyright statements in both controlled and more discursive vocabularies. As a relatively slow-moving project–because of the inherently slow and careful nature of the catalog work, the need to train students thoroughly before they can begin creating records, and limited amount of time our student researchers have to spend each year on the project–we continue to think through how we can create value that is complementary to and not soon to be substituted by faster and more automated modes of computer indexing and searching. So for us, the very subjective nature of many of our detailed bibliographic descriptions–often perceived as a problem by traditional cataloging and bibliography–has become a strength, particularly because these descriptions can be used alongside more objective and standardized modes of description from both within and outside our database.

Buurma1, Levine2, Li3_Figure1_2011
Figure 1. Early Novels Database (END) search interface

An example of how END might be used by an individual researcher will make things clearer. A scholar interested in when the types of works we now think of as novels first began calling themselves “novels,” to take a hypothetical example, can not only instantly call up all 189 records of works of fiction with the noun “novel” or “a novel” in the title; she also, at the click of a button, can see that of the records of novels with “novel” in the title, 27 of them also include the adjective “young”; that 56 of them have prefaces; that the majority of them are written in the third rather than the first person; and that eight of them profess to be written “by a lady” but were in fact penned by men. She can sort and unsort them by year and decade of publication, and notice that most of them are published in London, but that after 1775 many of them also are published in Dublin; she can pull up records of all novels that contain prefaces, and click on each record to see the individual idiosyncratic titles of each one; she will also find detailed cataloger notes quoting interesting passages from the prefaces, passages which may either tell her something she needs to know or indicate to her that she needs to take a closer look at a particular novel herself. She can find also out instantly that 134 of her set of novels have epigraphs on the title pages, and by looking at the authors of those epigraphs she can determine at a glance how many are by “ancient” and how many by “modern” authors. And she can do all of this work in seconds, rather than in the weeks or even months it would take for her to generate this information herself. So while as a bibliographic tool END does not itself make a claim about literary history, or even represent to its users the “insides,” or texts, of the novels it includes, it helps enable the writing of new, alternative histories of the novel.2 Using the well-worn digital technology of the electronic card catalog–a technology that is the result of a few centuries of changes in tools we create to locate books–END seeks to offer students and researchers a set of new and more flexible ways to locate and learn about early novels.

The Undergraduate Researcher: Classroom, Library, Database

Buurma1, Levine2, Li3_Figure2_2011

END relies on undergraduate researchers–so far students from Swarthmore College and Bryn Mawr College–as the primary creators of the records that populate our database.3 Recruited mainly from history of the novel college classes, the students usually have at least a little background in the history of the novel in English and descriptive bibliography before joining the project team. Nevertheless, if the detailed, painstaking investigation of each novel and transfer of information into the proper record fields necessary to create database records is to be a meaningful and interesting task, and for the student to be capable of making observations about noteworthy aspects of the novel, each student needs a certain amount of general background on eighteenth-century literature and culture and the material form of the novel. We therefore run an informal week-long training on the eighteenth-century novel, descriptive bibliography, and the design of the database each summer before work begins. Further, if the work is to become meaningful in the context of the student’s own ongoing education, it is important that she develop a personal project related to the database work; in informal blog posts and more formal papers, students have developed their ongoing interest in topics ranging from narrative form to the representation of dialect to the quantitative study of the link between the novel’s representation of time and the novel’s length in page numbers.

To us, one of the most valuable aspects of our database project is the fact that it offers undergraduates an opportunity to work with librarians, programmers, other students, and professors in a collaborative environment.4 Also valuable is the way that the project teaches students to begin to think like researchers as they work to puzzle out what kinds of information researchers will want to know. This skill is important not because of the specific content involved–few students who work on this project will go on to research in English literature professionally (and in fact our team’s first “graduate” is heading off to an excellent law school in the fall of 2011). Rather, the critical thinking skills that the ongoing attempt to think like a researcher develop–the development what we might call the “research imagination”–is what is important in the context of the student researchers’ liberal arts educations. Also important is the way working on the database helps students develop a very concrete understanding of the difference between the canon–that small subset of books that have been carefully preserved, regularly edited, and (most importantly) routinely taught in the classroom–and the library or the archive in which a much wider array of texts are preserved. For example, when I (if Rachel may interject in the first-person for a moment) teach my mid-level survey class The Rise of the Novel, students read canonical works like Daniel Defoe’s Robinson Crusoe, Samuel Richardson’s Pamela, and Frances Burney’s Evelina; the syllabus does not contain, for example, John Battersby’s Tell-tale Sophas: an Eclectic Fable in Three Volumes, Mary Walker’s Munster Village, the anonymous The Example; or, the History of Lucy Cleveland, or any one of the several thousand eighteenth- and early nineteenth-century novels that have survived but not become canonical. To see these texts, to turn their pages and skim their chapters, is necessarily to grasp an entirely different history of the novel; or, perhaps I should say, to realize that the history of the novel we teach depends upon the few texts we choose to assign. This isn’t to say that the canon isn’t valuable or that databases like END should replace the Penguin Classics, but merely that there are kinds of learning that undergraduates can do in the library and not in the classroom, and vice-versa. And at the same time that working on END enables students to live and grasp this difference–a difficult one to teach as an abstract concept–it also enables students to live and grasp the ongoing tension between the particularity of the book’s material form and the database’s attempt to categorize and capture a certain set of fixed and more-or-less objective characteristics. Again, the ultimate goal is not that students learn a lot of things about eighteenth-century novels–though they certainly do–but that the sustained examination of books, creation of database records, collaborative working environment, and library context make it possible for students to learn the kinds of things that they can’t learn in the traditional classroom, that they engage in a kind of learning that isn’t possible in the context of the course and the delimited class meeting.

Potential Futures of END

While this project is potentially endless–we’ve completed only about 200 records of the 3,000 Penn novels we plan to include and are currently piloting the inclusion of French novels in a partnership with Bryn Mawr’s Canaday library as we continue to seek new partner libraries and institutions–we are currently performing user testing and preparing to seek peer review from the 18thConnect group5 before embarking on the task of streamlining our cataloging protocol, training more undergraduate cataloger-researchers, and adding more records. We’ve written an article about the project, forthcoming in a collection titled Past is Portal: Teaching Undergraduates Using Special Collections and Archives.6 And we hope that some part of the value of END lies in its potential inspire other forms of collaborative humanities research that cross institutional lines in order to engage undergraduates in the process of producing new knowledge in the humanities.

Rachel Sagner Buurma is Assistant Professor of English Literature at Swarthmore College. Anna Tione Levine is a junior Honors English major at Swarthmore College. Richard Li is a senior Honors English major at Swarthmore College.

Distributed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Creative Commons License

Notes

1. END would not have been possible without the unwavering support and concerted efforts of the following individuals: Lynne Farrington Curator of Printed Books, Rare Book and Manuscript Library, University of Pennsylvania; Michael Gamer Associate Professor of English, University of Pennsylvania; Heather Glaser, Curator and Assistant Fine Arts Librarian, Fisher Fine Arts Library, University of Pennsylvania; Marianne Hansen, Special Collections Librarian, Canaday Library, Bryn Mawr College; David McKnight Director, Rare Book and Manuscript Library, University of Pennsylvania; Dennis Mullen, Web Developer and Designer, Van Pelt Library, University of Pennsylvania; Jon Shaw Head, Research, Training and Quality Management, Van Pelt Library, University of Pennsylvania; Laurie Sutherland, Metadata Specialist, Van Pelt Library, University of Pennsylvania; Eric Pumroy, Director of Library Collections, Canaday Library, Bryn Mawr College; Leslie Vallhonrat, Web Managing Editor, Van Pelt Library, University of Pennsylvania. View the database at http://hdl.library.upenn.edu/1017/88396.
2. While END is in many ways a database of information designed to give researchers a “middle distance” view of the novel (as opposed to enabling the kind of “distant reading” of visualized large-scale sets of information about the novel which Franco Moretti and others are interested it), some of the types of macroscopic information included may eventually lend itself naturally to graphical representation. (See Franco Moretti, Graphs, Maps, Trees: Abstract Models for a Literary History (London; New York: Verso), 2005.) Eventually, for example, END may be able to map the frequency of epigraphs against a timeline, or even more specifically, the frequency of quotations from Shakespeare used as epigraphs against a timeline. Even more important than building our own data visualization tools, however, will be making our database and data compatible with digital tools that others create; for one example, we are working to make sure that END is as compatible as possible with the Zotero citation manager.
3. The database construction itself–a complicated endeavor–has been expertly overseen by staff involved with the University of Pennsylvania’s Digital Library Architecture, with whom we meet regularly to discuss questions that cross database structure and record creation matters. See http://dla.library.upenn.edu/dla/staff/ancillary.html?id=dla/poweredbythedla for more detail.
4. For a look at our project’s internal blog that gives a bit of a sense of what day to day learning and work is like see http://transatlanticfictionproject.blogspot.com.
5. 18Connect is a group dedicated to the aggregation and peer review of digital resources relating to the long eighteenth century; see http://www.18thconnect.org.
6. Co-edited by Eleanor Mitchell, Peggy Seiden, and Suzy Taraba, to be published by the American Council of Research Libraries.