Time Will Tell, But Epistemology Won't: Part II

Christine Borgman explained how she was initially hesitant to speak at the Rorty Archive conference, which convened for a daylong celebratory event yesterday. Borgman joked that perhaps UCI actually wanted her UCLA colleague humanist Johanna Drucker. Nonetheless, she was eventually persuaded that her talk about "The Digital Archive: The Data Deluge Arrives in the Humanities" would be very relevant to an audience interested in the larger significance of the born digital and the lessons for the humanities that could be learned from scholars in other disciplines already grappling with very large collections of electronic data.

Her slide presentation opened with an actual image of a deluge raining down like a flood upon an orderly assembly of philosophers and literary theorists and classicists. She also described her amusement with the fact that the user agreement for those requesting access to the Rorty online collection still contained anachronistic stipulations like "The reading room is a no-pen zone. Only pencils and laptop computers may be used to take notes." and "Handle and read materials only at the reading room tables. Limit handling of items to the minimum necessary for your research and exercise all possible care to prevent damage to materials."

Now that the Library of Congress is undertaking to archive all of Twitter, an archive growing by 50 million tweets a day, Borgman worried that simple Boolean searches will soon not be good enough for managing data. She suggested that possible prototypes might be developed by those like Lev Manovich, whose UCSD team was developing software to manage twenty million art images. She also expressed hope that the sciences could provide some models, since they have had more experience grappling with unprecedented data availability covering vast sets of information about water, the environment, and bird life. She noted that this "data deluge" could prove "useful for learning as well," since students must serve as active collaborators working with more material than a single scholar can possibly process. As one example, she pointed to work being done in astronomy by students and interested amateurs scanning the night sky. She told the story of how a "strange blue object in the sky" observed in the Netherlands became a major scientific discovery.

Borgman claimed this "data deluge" could also shape a number of public policy issues, as individuals figure out how to use extremely large data sets for art and civil disobedience. She observed that mainstream media outlets are beginning to take notice and pointed to a recent Economist report on "Data, data everywhere" as an example. She explained how Johns Hopkins was taking on large scientific data archives with large synoptic data sets that showed dramatic growth just between the Sloan Digital Sky Survey and the Panoramic Survey Telescope & Rapid Response System.

Next, she introduced the notion of a "knowledge mash-up," which she conceded might seem like an alien approach to the archivists in the audience, because it represented "adding context and taking things out of context" and even putting in "your own politics and interpretation." With the proliferation of off-the-shelf tools from places like IBM, the mantra of an early Apple ad "Rip. Mix. Burn." has become "Unlock data. Remix content. Unleash productivity."

One might have questioned how comparable a Google map of Denver foreclosures and a photoshopped image of Obama in sheeps clothing were in Borgman's section on "media mashups," but the fact that these tools were being used by the business community, the scientific community, and groups like the Open Mashup Alliance was difficult to deny.

Then Borgman shifted to the question of "common platforms" that make such mashups possible. She also gave some credit for how her concept of a "scholarly information infrastructure" -- for either "e-science" or "eResearch" -- had been informed by Amy Friedlander. With some bemusement, she also cited Our Cultural Commonwealth as an example of the "me too" sentiments coming from the humanities, although she expressed her wish that humanists would continue asking questions and building a research agenda that seriously engages with 1. "digital scholarship," 2. "scale," 3. "language and communication," 4. "space and time" (with the timelines, maps, and layers of Perseus and Rome Reborn cited as examples), and 5. "social networking."

By introducing the question "what are data?" Borgman was able to further develop her theme of disciplinary difference. She enumerated four specific types 1) observational, 2) computational (like modeling or simulation), 3) experimental (like lab experiments or field work) and 4) records. She referred the audience to a 2005 NSF report on Long-Lived Data for other policy work on research and education in this area.

She also reminded participants about her own background in the social studies of science to caution against conflating data with facts. Indeed, she said that too many researchers had a "tendency to reify facts" and rely on "alleged evidence" without thinking about "context and framing." At this point she inserted another citation to "Buckland, 2006" on "Metadata as infrastructure: What, where, when and who."

She wondered aloud about the future of the humanities as an "inward looking" field that was "archive-centric" rather than "data-centric." She argued that the humanities had a poor record on "rights to use and reuse" and on intellectual property more generally, which she characterized as "rights in the modern university."

She closed with asserting that "the data deluge is real," even if the Rorty archive only represents the "tiny trickle" that humanists are becoming aware of. She described how Peter Young at the Library of Congress served as an example of an archivist who would have to meet those twenty-first-century challenges. She insisted that the "value of data lays in scale, aggregation, analytical tools, and distributed access" and that the delayed access model in the humanities, symbolized by the five-day approval period for the Rorty digital archive, raised troubling issues.

In the question and answer session, Borgman was asked to review a slide that contrasted information practices in the humanities with those in the sciences. In the humanities, Borgman saw "data usage" driven by "manual mining" in which the scholar must "inspect the individual archive" for "deep interpretation." "Tools" were largely "collection-level finding aids" and "content specific methods." The model for "intellectual property" was aimed at "registered scholars" who were "licensed for local use" and "on-site access."

In contrast, data usage in "data-centric science" privileged "instant access," "robotic mining," "mashups" with other data sets, and "multi-variable analysis." Tools were designed for "open standards," "common platforms," and "mining and visualizations." Intellectual property was determined largely by an "open access license" model, a "share and share alike" ethos, and an environment in which "derivative works" are "encouraged."

Borgman also warned against declaring that the data deluge foretold the "end of theory." With an example about the difference between "weather data" and "climate data," she emphasized the value of making long term comparisons and told of how Harvard was digitizing one hundred years of glass plates created for astronomy records. She also preached against a "one-size-fits-all policy," particularly when there are bad actors, who might misuse information, such as "releasing locations of endangered species."

