The Virtual Observatory and the Roman de la Rose: Unexpected Relationships and the Collaborative Imperative
Do most faculty in the humanities see collaborative research and engagement with large datasets in their future?
Two projects led by
Johns Hopkins
University that seem to come from
apparently opposite ends of the spectrum--the
Virtual
Observatory (VO) and the
Roman de la
Rose Digital Library--reveal some
unexpected relationships that bring to light a shift in scholarly practices that cyberinfrastructure may bring.
The Virtual Observatory represents one of the quintessential
cyberinfrastructure projects--large, complex datasets shared, visualized and
analyzed by a distributed group of astronomers. The Rose project features a
digital library (along with related services) comprising digital surrogates of
Old French manuscripts copied from the late-13th to the mid-16th centuries,
many of which are richly illuminated. How could
the Rose project offer any insight into data-driven scientific
investigation? Even a completely digitized corpus of all extant Rose
manuscripts (some 250 are known worldwide) would
not approach the scale of the VO datasets.
But, upon reflection, we suggest there may be
an important relationship between "data mining"
and collaborative scholarly practices in the sciences and humanities.
Greater collaboration among humanities scholars in the discovery and
production of knowledge is cited by the ACLS
cyberinfrastructure report, Our Cultural Commonwealth,
as one of the goals and characteristics of an
effectively implemented cyberinfrastructure for the humanities and social
sciences. But it is posed in the report against the entrenched,
traditional academic culture of the "individual genius" working in
isolation.[1] Referencing a recent study on American literary
scholarship online, the ACLS report commented that:
Despite the demonstrated value of collaboration in the sciences, there are relatively few formal digital communities and relatively few institutional platforms for online collaboration in the humanities. In these disciplines, single-author work continues to dominate. Lone scholars, the report remarked, are working in relative isolation building their own content and tools, struggling with their own intellectual property issues, and creating their own archiving solutions.
While this may be true, we have reason to believe that a change is at hand; what is more, when one considers the evolving historical
relationship between the humanities and sciences,
the picture becomes more complex.
Rudolphine Tables = Open Content Alliance?
Scientists were not always good collaborators. In
discussing the Rudolphine Tables (Johannes Kepler's 1627 star catalog and planetary tables that
radically improved the ability to calculate planetary positions), computer
scientist Michael Nelson made the
startling suggestion that the Tables might be considered on a par with
today's Google Book
Search or the
Open
Content Alliance,
in their power to inspire a new generation of
scholarship.[2]
But he continues by noting that there were a host of issues standing in the
way of the Tables' publication, including "significant infrastructure costs
(in the form of purpose-built observatories),
professional jealousy, intellectual property
restrictions, and political and religious instability."
This suggests that at the time, astronomy was a
discipline defined by lone practitioners who would guard their data with great
secrecy; in the "data-poor" environments of the early scientific era,
scientists did not readily share data or collaborate.
In contrast, by 1627 when the Rudolphine Tables were published, the Roman
de la Rose had been written, re-written, re-purposed, recast, illuminated,
and shared many times over. In that era, before the development of scientific
instrumentation, when "data" consisted of the spoken word, the written word,
and illuminations, this body of manuscripts represents a "data-rich"
environment where humanists did
collaborate in the creation of new knowledge.
Perhaps it is not a set of inherent
characteristics within specific disciplines that defines their mode of
scholarship or communication, but rather the relative ease or difficulty with
which practitioners of those disciplines can generate, acquire or process
data. While many may think that humanities materials are
comparatively data-poor, we suggest they
can be data-rich in numerous ways. A single Rose
manuscript, for example, contains a tremendous amount of textual, visual, and
semantic content that is sometimes difficult to extract in meaningful ways,
and nearly impossible to represent adequately in a printed edition. As our
ability to move these data into digital formats improves,
we believe that humanists will be drawn into new
forms of collaboration that will inspire new kinds of scholarship: large-scale
digitization might bring the humanities into a new age of "data-driven
scholarship," much as the Rudolphine Tables inspired astronomers.
Data-Driven Convergence?
The NSF's 2007 report, Cyberinfrastructure Vision for 21st Century Discovery, cites 27 recent cyberinfrastructure studies and reports from across the sciences, engineering, social sciences, and humanities[3]. This surely represents an unprecedented convergence of interest across C.P. Snow's "Two Cultures"[4] in the promise of cyberinfrastructure and of data-driven research. There is no doubt that the sciences and engineering are leading the way for data-driven scholarship in our current environment, but many areas of humanities research are increasingly data-driven as well. As our digital library group at Johns Hopkins has learned more about the data curation needs of projects from a variety of disciplines, we have realized that we are facing a data deluge--not only relating to the Virtual Observatory, but also to the ever-increasing size and number of data files that other humanities projects such as the Roman de la Rose Digital Library are now generating.
Manuscripts, so evidently data-rich in the era in which they were created,
today retain their former value and meaning while they inspire a new
generation of humanists to create new sets of data. This includes the metadata
needed to encode, organize, and understand the texts, annotations, and the
visual art embodied in the manuscripts. Not only does this demonstrate the
parallel need for data curation and preservation in the humanities and the
sciences (for at the level of storage infrastructure, a byte is a byte and a
terabyte a terabyte) but it underscores the fact that there is an increasing
convergence of what it is that is analyzed by humanities scholars and
scientists: data. In addition, there is an increasing overlap between the two
communities in the tools needed for storing, accessing, and manipulating this
data. Let us propose, then, that putting aside obvious aesthetic differences,
scientific datasets are a modern "equivalent" of
medieval manuscripts.
In fact, one could argue that manuscripts such as The
Rose represented the richest sets of data/information available in
their day and were stored for subsequent examination, analysis and
repurposing. Additionally, they contained multiple types of data such as
integrated texts and images, user annotations, and intertextual allusions and
references. These intertextual references frequently pointed the reader to
other texts available in the same monastic or university libraries. Thus the
early codex, situated in a library of other codices to which it was linked in
a semantic web of intertextuality, was a collection of active links,
hyperlinks if you will, that simultaneously informed the reader how to
navigate the text at hand and pointed outward to other relevant documents. The
library was the "web" before the Web existed.
Digital tools are allowing us to capture, manipulate, and examine books and
their data in ways that are revolutionizing the humanities. Entire libraries
are now being digitized, linking their components in unforeseen ways.
Libraries that have been dispersed by auction, theft or the vagaries of time
may be virtually reassembled. And new libraries, whether a collection of all
extant Rose manuscripts (which of course has never, and could never
have been, assembled) or something on the immense scale of the Google Books
project, are emerging, bringing with them powerful tools and possibilities for
research that have barely been realized. Finding
themselves in new kinds of data-rich, multi-media
environments, created by
the mass digitization projects as well as the continuing projects of
libraries, museums and archives to digitize their special
collections, image, moving-image and sound files, humanists are increasingly
considering the potential for cyberinfrastructure-related research and
teaching.
Lessons Learned?
Digital media provide an opportunity to reflect more accurately forms of
medieval textuality and transmission that disappeared during the print era.
The routine integration of text and image on computer screens, the recombinant
nature of electronic texts, and the idea that anyone can copy, alter, edit,
and retransmit a document (much to the chagrin of those with the most to lose
from the potential collapse of traditional copyright laws), all have strong
parallels in medieval texts and acts of textual transmission.
Print culture played a formative role in creating the notion of a single,
authoritative text, as well as the expectation of an individual genius working
alone. Technology in the form of the printing
press shifted the
scholarly landscape. Old models of
collaboration, as well as the attendant mechanisms of creating, publishing,
and transmitting works of scholarship, were replaced by a new world of
large-scale publishing whose aim was the production of multiple, identical
copies of a single authoritative text. The mechanization of production,
copying, and transmission led to the virtual extinction of a scribal culture
that produced unique versions of texts in which the roles of author, scribe,
editor, and publisher were inextricably blurred.
Technology, both in its processes and tools, always will
influence and shape a culture. But how do we ensure that the evolving
cyberinfrastructure supports but doesn't overly
define the new forms of emerging data-driven scholarship? One of the
imperatives for the humanities community is to define its own needs on a
continuous basis and from that to create the
specifications for and build many of its own
tools.[5]
At the same time, it will be worthwhile to discover whether new
cyberinfrastructure-related tools, services, and systems from one discipline
can support scientists, engineers, social scientists, and humanists in others.
NSF (perhaps in collaboration with the
NEH and
IMLS ) might help
track the portability of such resources.
Finally, we want to point out that the reason we can apply a historical lens
to this issue today is because of earlier commitments to the preservation of
our heritage. However, as highly-coveted manuscripts and other valuable
physical objects are digitized, the resultant datasets are often not as highly
regarded by libraries. We believe this represents a shortcoming of vision. For
while the curation of physical codices will remain an essential role for
libraries, the collection and curation of digital objects will assume greater
importance for libraries of the future, and the infrastructure, budgetary
priorities, and strategic plans of library organizations would do well to
account for this sooner rather than later. In the digital age, data can become
at risk in as short a period as five years, and we have already irrevocably lost important datasets. The importance of curating datasets to ensure
long-term, persistent access cannot be overstated. Imagine the loss to science
and scholarship if we had not preserved the Rudolphine Tables or the Roman
de la Rose manuscripts.
[1] American Council of Learned Societies' Commission on Cyberinfrastructure for Humanities and Social Sciences, Our Cultural Commonwealth (2006), p.28. See also p. 48 on how "traditional scholarly work, in the form of a single-authored, printed book or article published by a university press or scholarly society, is the currency of tenure and promotion, and work online or in new media, especially work involving collaboration, is not encouraged." http://www.acls.org/cyberinfrastructure/acls.ci.report.pdf
[2] Michael L. Nelson, "I Don't Know and I Don't Care," NSF/JISC Repositories Workshop, April 2, 2007 http://www.sis.pitt.edu/~repwkshop/papers/nelson.html. Retrieved September 2, 2007.
[3] National Science Foundation Office of Cyberinfrastructure, Cyberinfrastructure Vision for 21st Century Discovery, 3:46 (2007): Appendix B, "Representative Reports and Workshops." http://www.nsf.gov/od/oci/CI_Vision_March07.pdf. Retrieved August 8, 2007.
[4] The term, coined by British scientist and novelist C.P. Snow in his 1959 Rede Lecture "The Two Cultures and the Scientific Revolution," became a shorthand for the rift between the sciences and humanities in approaches to problems. See C.P. Snow The Two Cultures (Cambridge Univ Press: 1959, reprinted 1993).
[5] See, for example, the effort to articulate this in the
2005 report from the University of Virginia's Institute for Advanced
Technology in the Humanities,
Summit
on Digital Tools for the Humanities. http://www.iath.virginia.edu/dtsummit/SummitText.pdf Accessed October 6, 2007.
How to cite this work
Sayeed Choudhury and Timothy L. Stinson. "The Virtual Observatory and the Roman de la Rose: Unexpected Relationships and the Collaborative Imperative." Academic Commons Issue Name (Spring 2008): 15 March 2010. <http://www.academiccommons.org/>.- Login or register to post comments
- Email this Essay
Delicious
Newsvine
Facebook
Google
Technorati