Tuesday, September 08, 2009

The Spirit of 1899

In "Google Books: A Metadata Train Wreck," Berkeley library science professor and NPR commentator Geoffrey Nunberg gives a number of examples of the flawed metadata in the current iteration of Google Book Search. Bloggers have noted the humorous presence of images of digitizers' hands appearing in page facsimiles, but the errors Nunberg catalogs seems to indicate an absence of human minder.

Start with dates. To take GB's word for it, 1899 was a literary annus mirabilis, which saw the publication of Raymond Chandler's Killer in the Rain, The Portable Dorothy Parker, André Malraux' La Condition Humaine, Stephen King's Christine, The Complete Shorter Fiction of Virginia Woolf, Raymond Williams' Culture and Society, Robert Shelton's biography of Bob Dylan, Fodor's Guide to Nova Scotia, and the Portuguese edition of the book version of Yellow Submarine, to name just a few. (You can find images of most of these on my slides, here — I'm not giving the url's since I expect Google will fix most of these particular errors now that they're aware of them).

And while there may be particular reasons why the 1899 date comes up so much, these misdatings are spread out all over the place. A book on Peter Drucker is dated 1905, a book of Virginia Woolf's letters is dated 1900, Tom Wolfe's The Bonfire of the Vanities is dated 1888, and an edition of Henry James 1897 What Maisie Knew is dated 1848.

It might seem easy to cherry-pick howlers from a corpus as exensive as this one, but these errors are endemic. Do a search on "internet" in books written before 1950 and Google Scholar turns up 527 hits.


Thanks to Michael Thomas for the link.

Labels: , ,

Friday, June 12, 2009

Miscellaneously Yours



I met the Berkman Center's David Weinberger last month when I gave a talk there and was very grateful that he provided a lengthy live blogging post that translated the group's discussion into text, so that the lively exchanges about what constituted the public record in the age of experiments with e-government could be disseminated with more keywords and metadata than a simple videorecording alone would have provided. I tend to be leery of live blogging, for reasons that I explain here, but I know from experience that even blogging with a slight rigor mortis has been appreciated by workshops and conferences [1], [2], [3], [4].

That's why I was especially pleased to realize afterwards that he was the same David Weinberger whose book I had been carrying around before my trip to Boston, Everything is Miscellaneous, an extended meditation upon what he calls "the power of the new digital disorder" that cites Plato, Aristotle, Heidegger, and Wittgenstein, as well as theorists of the digital age such as Vannevar Bush and John Seely Brown.

Not that I'd necessarily agree with everything in the book. When Weinberger enthuses about tagging I remember hearing research from Hugh C. Davis on folksonomy in 2007, which showed that many Del.icio.us tags were actually not useful for categorization purposes by others, since only about 5% of tags were usefully descriptive. For example, he said 34% of them could be categorized as personal references (“toHugh,” “myBlog,” “toRead,” etc.), and many used the redundant “SaveThis.” Others used evaluative terms like “cool” and “kickass" that were interesting in aggregate but difficult to work with. Weinberger's examples, like "SF" and "London," often suggested a much more limited range of meaning. Of course, with enough people tagging digital files, that 5% could aggregate a large number of useful tags, and it could be argued that users will eventually be trained in good digital behavior over time, much as people had to learn to say "hello" when they first picked up the telephone in the analog age.

As someone who specializes in "scandal, disaster, miscommunication, and mistakes," I would have liked to have seen more analysis of cases in which digital disorder seems to be . . . well . . . disorderly. For example, on page 100, Weinberger mentions the Wikipedia article on "elephant" without discussing Stephen Colbert's famous prank with that page.

Given his work as a corporate consultant, I also felt possible conflicts of interest could come into play, since his analysis was sometimes remarkably uncritical about how privacy could be compromised by commercial data mining that exploits unwitting or unwilling user behavior with new search and cloud-computing technologies.

Finally, by asserting that "[w]hen our kids become teachers, they're not going to be administering tests to students sitting in a neat grid of separated desks with the shades drawn" (144-145), Weinberger sounds inclined to accept the idea of a "digital generation" a little too uncritically without considering the issues raised by Siva Vaidhyanathan in "Generational Myth" or the research of Diane Harley that shows that younger scholars aren't necessarily innovators with instructional technology.

But this is a useful book that contains real epistemological and rhetorical insights. For example, Weinberger recognizes that a profile on a social network site actually constitutes "a complex social artifact that results from my goals, self-image, and anticipations of how other people will interpret my list" (155) and that "length is a symbol of importance" in the Britannica, while it is "a manifestation of interest and importance" in Wikipedia (208). His definitional work on "knowledge" and "understanding" is particularly good and shows off both his Ph.D. philosophical training and his pramatic experience as a bestselling author.

Labels: , ,

Wednesday, October 01, 2008

Hate Male

Fellow Irvine blogger Scott Kaufman has talked about possible rationales for the deletion of misogynistic comments on feminist blogs. I've been thinking about this issue again, since I've been receiving a particularly nasty batch of comments from particularly hateful readers who want to stage violent or sexual scenarios that involve my actual person.

As I've said, I suspect that male bloggers don't get these kinds of comments and that having their masculinity questioned is as ad hominem as their reader responses get. However, I know the "Don't Feed the Troll" rule well enough, so I won't reproduce the actual comments here to demonstrate my point. Trust me, they are bad.

Of course, my first reaction tends to be the impulse of the police officer at the scene of a murder: it must have been someone who knew the victim well. So I find myself going through the rolls of former students, ex-boyfriends, and disgruntled neighbors to generate lists of possible suspects and compare prose styles with the evidence in my inbox.

Rationally I know that this is a faulty approach. Google Analytics has shown me that most people reach the entries on this blog through keywords typed into search engines. Top terms like "exhibitionism," "porno," and "virtual rape" commonly bring visitors to my site. Such computer users must be disappointed to realize that I am treating voyeuristic subject matter with an ironic or critical distance. I would guess that arrival on a feminist site results in an urge to retaliate, although unfortunately just giving some samples of the luring metadata in this posting may lure even more of these misogynistic readers.

Labels: , , ,

Sunday, September 14, 2008

Hearing Test

Virtualpolitik friend Nick Diakopoulos has just released Audiopuzzler, a game that responds to the tendency in many videogames to privilege sight over the other senses. Although the altitude of the high scores may be intimidating to first-time players, the game is designed to be user-friendly. Unfortunately, players who have unpleasant memories of typing tests and dictation drills (a.k.a. any woman who was born before 1970 in the era in which girls had to learn secretarial skills in school separate from boys) may be more resistant to the game. Audio snippets include an anti-Obama political ad about gas prices and a clip from An Inconvenient Truth. Diakopoulos explains the game's rationale as follows:

In the process of playing the game, players contribute transcriptions of snippets of video - the better the transcription, the more points the player earns. These transcription snippets contribute to the enrichment of the video for other users and can facilitate things like close captioning among other things. It's a way to have fun with a meaningful by-product - one especially valuable considering the difficulty of achieving high accuracy video transcriptions using automatic methods.

Diakopoulos also uses crowd sourcing in a game about tagging photos with metadata.

Although I am intrigued by this example, I also wonder about the implications of giving multiple parties access to audio files from news and entertainment sources in this way, because of rights clearance questions. Too often projects intended to improve digital materials for the disabled -- such as electronic readers for the blind -- have encountered problems with copyright restrictions. In this case the snippets may be minuscule enough and the cause so worthy that intellectual property issues may not trump other concerns.

More on Water Cooler Games.

Labels: , , ,

Monday, September 01, 2008

The P-Word

The absence of the word "peace" from the Democratic National Convention is all the more remarkable, given its prominent place on the website for the Republican National Convention, which puts "Service, Reform, Prosperity, Peace" in the very metadata for the title of the page. Now that opening festivities have been postponed because of Hurricane Gustav, the splash page for the site refers those who would normally be political partisans to www.causegreater.com,which contains a number of donation websites for states in the hurricane's path, such as www.servealabama.gov and http://www.mississippirelief.com/. However, it is interesting to note the role of AidMatrix on the convention's website, which has a donation button to FEMA, along with corporate sponsor UPS. In this way the ideological messages about private charity and corporate stewardship rather than a publicly supported safety net are reinforced in their digital rhetoric. McCain also signaled his support for faith-based initiatives by appearing at a Gustav event at ISOH/IMPACT, which was "formerly known as International Services of Hope & Impact With God Crusades, Incorporated" and sponsors several ministries, where he was photographed next to a young lady wearing an "i Tune into God" t-shirt.

Labels: , ,

Friday, August 29, 2008

The More the Merrier

Often online exhibits from a library’s special collections emphasize discrete “gems” or “treasures” and recreate a display-case culture of spectatorship for those who visit websites that display historical materials rather than the nitty-gritty discovery activities known to those who open actual Hollinger boxes full of files to answer complicated research questions in archival detective work.

A different approach was emphasized in today’s panel on “digitizing entire collections” at the annual meeting of the Society of American Archivists. Presenters emphasized case studies in which entire collections were put online, although doing this cost-effectively often meant making sacrifices by only providing minimal metadata to keep costs at around the dollar-a-page mark to which many institutions aspire. “EAD” or “encoded archival description” was one of the most commonly used acronyms at the panel and at the conference in general. The other big acronym bandied about at the table was for the funding agency NHPRC or National Historical Publication and Records Commission.

Unfortunately, with Google searches that may land Internet users into the center of an archive with no context or navigation back to content descriptions or finding aids, such minimal metadata strategies also risk reinforcing the fragmentation already experienced by those haphazardly searching for documents in a web search. Furthermore, all three collections that were showcased on the panel consisted largely of hand-written materials that had not been transcribed and were therefore not extensively searchable.

Civil War documents from the Archives of Michigan posed a number of challenges to digitizers, particularly since they vary in size. Although the process of digitization is often depicted as an “automagical” transubstantiation, Mark E. Harvey’s in absentia presentation, “Thank God for Michigan” acknowledged a number of complications in the process from worker equipment to protect against possible environmental hazards to vendors complaining about set contracts when projects become over budget. Luckily, the Civil War project has benefited from an active user community, which included the Ann Arbor Civil War Roundtable, and expressed a willingness to solicit constructive criticism from the archivists present, who pointed out that “civil war” didn’t specific the country and that other metadata samples didn’t specify the state. As part of the “Seeking Michigan” website redesign, which Harvey had jocularly renamed “Desperately Seeking Michigan,” the project is hoping to eventually expand to include private records, such as diaries and letters from individuals who were engaged in combat. The Archives of Michigan also maintains a Flickr page, although it has less than a thousand documents.

Blogs have become a tool for recording progress and publicizing lessons learned in many of these cases. Michigan has its blog at "Thank God for Michigan." However, the subsequent presenter, Kaye L. Minchew, who was also author of specialized state electronic encyclopedia pages, such as "Franklin D. Roosevelt in Georgia," complained that her own contributions to "Troup County Court Record Scanning Project" too often felt like a failed diary entry.

Final presenter David Null explained how ownership of the physical papers of an early environmentalist at the Aldo Leopold Archives and ownership of the intellectual property rights by the separate Aldo Leopold Foundation could create possible conflicts of interest. In addition, since visitors could enter the archive from either portal, those who land in the middle of it after a Google search might not have a clear way out to a definitive home page.

Labels: , , ,

Saturday, June 23, 2007

High-Tech Beatrice

At the Calit2 event, I also met super-cool computer scientist Cristina Lopes, who demonstrated her wearable search engine for the 3D online multi-user world Second Life. Most obviously, Lopes defied what are now trite outdated stereotypes about women in science. She was there with her young daughter and spoke about her teaching commitments, but she also was eager to represent herself as a programmer and showcase the code.

With her souped-up system, I must say I was very pleased with the zippy performance of the SLBrowser. Much like many Web 1.0 and 2.0 search engines, it uses the metadata associated with the objects and the built environments created by Second Life inhabitants to rank results. Although advertised largely as an online shopping tool, the SLBrowser proved remarkably effective at finding sites for abstract concepts as well as concrete products. I'll admit that I haven't tried the offerings of competing brands, such as Electric Sheep, but with SLBrowser I easily found sites for social interaction for everything from gathering places for those with HIV/AIDS to staging grounds for academic public diplomacy events.

Later, when I actually went into Second Life myself, as Malaise Etoile, my SL avatar, it was easy to get the free browser from their storefront on a commercial street on Neptune. (In the immediate digital vicinity, there was also a support group for agoraphobics, an escort service, and a high-end virtual clothing boutique.)

A word to the wise, however, there are places in Second Life that don't allow the SLBrowser to work. Cristina tells me there's a little icon on top of the SL client, towards the left, that indicates "no scripts," if this is the case. You'll also need to change the size of the frame by clicking on the yellow triangles on the top left of the browser or you may not have a legible display. Because it's free and powerful when it is working, I'd still recommend trying the SLBrowser for yourself. For finding things in Second Life, it is much better than their own proprietary search box and much more efficient than some of the guide books on the shelf.

Labels: , , ,

Thursday, June 14, 2007

Best Metadata Ever

I've been looking at how people tag their own blog postings a lot of late, and I have to say that BlogHer contributing editor Trillwing takes the prize over at The Clutter Museum:

Of course, except for "Horsey goodness," these all seem like perfectly acceptable labels for my own blogging enterprise and many of the other female bloggers that I know.

Labels: , ,

Saturday, June 09, 2007

If the Label Fits, Wear It



This video is an interesting case of a particular rhetorical move in which images are divorced from their explanatory labels, as is the case with many artifacts of digital news. In the era of metadata, when large numbers of people are tagging images and thinking about issues that once only librarians and archivists think about, it may seem somewhat counterintuitive to suggest that labels are only labels. This film also obviously plays on Samuel Huntington's trope of the "Clash of Civilizations," which has been important in Internet discourse about the Middle East since September 11th.

Fans of street art also probably recognized how the graphic identity of the two female protagonists of the film are shaped by the wheat paste aesthetic of sites like the Wooster Collective. Note that the filmmaker includes a shot of stencil artist Banksy's graffiti on the Israel-Palestine wall. It's an interesting case of how the digital often meets the situated and tangible in contemporary political art.

Perhaps more remarkable is this video from the young YouTube critic who chose to feature the "Stop the Clash of Civilizations" video . . . even if, as an old-timer, I can say with some authority that this kind of independent youth journalism has been peddled ever since the nineteen-seventies, when I was one of the subjects of a book called Listen to Us! from The Children's Express.

Labels: , , , , , ,

Saturday, June 02, 2007

Blogroll Mash-Up

To the right of this post, you will see a "blogroll" with a list of links for the websites that I regularly peruse on the topics of rhetoric, technology, education, politics, marketing, design, and cyberculture. There are many professional associations represented there, but there are also a lot of personal blogs from people without any official sponsorship who have related interests and write well enough to make it worth the visit. This morning I felt particularly aware that worlds collide when I read more about the Big Donor Show on Dutch television, the subject of a public outcry because the creators of the reality show Big Brother were apparently featuring three desperately ill contestants vying for the kidney of a terminally ill women. Of course, today the media was reporting it as a hoax, designed to draw attention to the shortage of donors. This was a story covered by both The Museum of Hoaxes that draws attention to Internet gullibility and my European cyber-pal over at Houtlust who covers the international social marketing beat. It's actually not the first time that social marketing has exploited new media credulity, and I don't think it will be the last.

Readers can also see that I've added a list of topics to please the big fans of metadata among us.

Labels: , , , ,

Saturday, March 24, 2007

Dispatches from the Front Lines

I've just returned from the annual national Conference on College Composition and Communication, which was held in New York City this year. During the past century, university writing program administrators have worked hard to garner respect within the academy as fellow teachers, intellectuals, and professionals. Unfortunately, as one of my colleagues warned me years ago, researchers in the field are sometimes treated as "untouchables," perhaps because they have contact with the most marginalized discourse from the most marginalized members of the campus community, which makes them impure if not outright polluted by their admittedly necessary work . . . which like the labor of garbage collectors or morticians must still be done by someone, of course. Even within the low-caste family of her academic discipline, Composition Studies can seem the tattered stepsister to her two more glamorous siblings who benefit from the riches of their own departments and PhD programs: Communication and Rhetoric.

This is unfortunate, because the central conflict in the academy between the Culture of Information and the Culture of Knowledge -- which makes C.P. Snow's old divide between the humanities and the sciences look relatively minor by comparison -- is being played out most publicly and most violently in the classrooms of first-year writing. Here are some statistics that old fashioned bibliophiles who are firmly attached to the institutional status quo might find terrifying:
  • 83% of adult respondents thought that a twelve-year-old knew more about the Internet than their elected representative in Congress (Zogby 2006)
  • 48% of all children six and under have used a computer, and 30% have played video games (Rideout, Vandewater, and Wartella 2003)
  • 55% of youth 12-17 use social networking sites (Pew 2007)
  • 57% of teens who use the Internet could be considered media creators (Pew 2005), a statistic that may be an undercount, because it does not factor in newer digital forms of expression or those that produce artifacts other than written texts (Jenkins/MacArthur 2006)
  • While engaged in an average of 2.7 simultaneous Internet Message conversations, 39% of surveyed college students were also writing academic essays while multitasking online (Baron 2006)
  • 71% of students at the University of Minnesota use Wikipedia; 28% cite it (Adams 2006)
  • 36% of students in a U.S./Canada study admit to "cut and paste" plagiarism of sources from the Internet (McCabe 2004)
  • 81% of faculty in the Humanities and Social Sciences get digital resources from Google-type searches (Harley 2006)
Two years ago, the "4Cs" in San Francisco showed how seriously composition studies was taking the impact of digital culture on the academy. Lawrence Lessig's talk there was well-attended, and Andrea Lunsford's influential Intellectual Property Caucus was taking the discussion far beyond the mere policing of online plagiarism. This year the caucus passed an important resolution about open source, but in panels there was more concern about writing and new social media platforms like Facebook and MMOs rather than writing and copyright.

Furthermore, it looks like there is still little consensus on practice when it comes to using social media to teach. The two most noteworthy panels on teaching writing through blogging put forward two very different models of success. Dennis Jerz of Seton Hill has students blog under their own names -- where they "take responsibility" for their public statements, explore group blogging, and build discursive networks based on informal sociality using a university server. In contrast, Jeffrey Middlebrook of USC showed some amazingly sophisticated student blogs that epitomized good metadata and linking practices which emphasized individual presentations on commercial servers with pre-professional subject matter, although strangely students published their public sphere work under pseudonyms to protect their privacy.

I presented at the panel about videogames and rhetorical identities, which included the work of Mathew S. S. Johnson, who is a guest co-editor of the upcoming issue of Computers and Composition on the subject.

Despite these exciting developments, I was sorry to see little work being done about the metadata issues that may be more fundamental to research on the development and production of academic writing than chit chat about fun with wikis and blogs and MMOs. This is particularly true, given the high profile of large corporate players from the search engine business in reconfiguring the information literacy practices of our students and perhaps ultimately their access to the digital archives that are essential for well-informed authorship. In arguing on behalf of forming a new discipline around "Critical Information Studies," Siva Vaidhyanathan has identified several scholarly communities of association that may be threatening the ostrich-headed status quo: "Economists, sociologists, linguists, anthropologists, ethnomusicologists, communication scholars, lawyers, computer scientists, philosophers, and librarians." Librarians are an important cohort in the teaching of writing, and they still weren't adequately represented at 4Cs.

Labels: , , , , ,

Tuesday, March 06, 2007

Read My Lips

Video file-sharing services now offer options to those who want to make informed decisions about candidates or follow public policy issues. YouTube You Choose links voters directly to the official YouTube channels of all the current Presidential hopefuls. On Google, Carl Malamud on behalf of Congress transforms the meandering live feed of Congressional hearings into video files that can actually be archived, replayed, and fast-forwarded through. Providing useful metadata for that video is the tricky part, unfortunately. Now, if only we had voice recognition technology that could produce transcripts from the Malamud files, so we could avoid relying on the Grace News Network.

Update: The news from C-Span that they will make more video easily available to the public online is encouraging to advocates for public disclosure who worry about proprietary attitudes toward the public record that digital video archives may often foster.

Labels: , , , , ,

Monday, February 12, 2007

Playing Tag


A recent Pew Report claims that 28% of users tag content and that 7% do so on a daily basis. There has been a lot of debate about how well this collective intelligence works on a global basis. As an experiment in the process, I've started my own world map of hotel minibars, and I would encourage readers to contribute snaps and geodata to this worthwhile example of information aesthetics.

Why hotel minibars? They are so analog it would seem! Of course, I'm interested in everyday design issues and user interfaces in the context of globalization. But there is also a certain nostalgia involved in documenting this rapidly disappearing species, which has already vanished from the Marriott hotel chain and many Hyatts in the United States.

Labels: ,

Badlands

Yesterday The Los Angeles Times carried two stories about how video file-sharing services were archiving images of real rather than fictional violence that depicted seemingly unresolvable political conflicts. "Mexican Drug War's Brutality Celebrated on YouTube" argues that videos "intended to cheer on or denigrate the opposing sides in Mexico's drug wars" are proliferating on popular online sites. Last week, murderous assaults on officers in police stations in Acapulco were allegedly filmed by gunmen, although the videos have yet to surface on the Internet. The LA Times makes an analogy to a genre from the medium of music, the narcocorridos or ballads dedicated to drug lords. Most of these gory videos are apparently produced by those with no ties to the actual criminals, and sometimes they are captured and disseminated by the police themselves.

A less violent altercation was discussed in "Soft or hard bop? Either way, whack is a hit with viewers" that chronicles the controversy surrounding the digital recording of a particularly chaotic City Council meeting in the dysfunctional City of Carson. Although the city itself regularly archives video of its public proceedings, a clip of a Carson official shrieking and falling after being struck with a sheaf of papers by a woman seeking the recall of the current mayor has found more fame by being broadcast on YouTube. Constituents and spectators outside of Carson are now debating how much of an assault actually took place, since the incident appears to have devolved into political theater in its most slapstick form.

Now that local municipalities and even federal agencies are relying on video webcasts for access to the public record of the official business of governance -- rather than traditional print documents -- there are also issues about cataloging material into identifiable chunks and adding meaningful metadata. With text-based records, it is much easier to search for key words and automate systems to generate the most relevant information to a given member of the electorate. Video and other images require that the selection processes of groups of users be activated, often on a voluntary basis, to locate exactly where a site of potential misunderstanding might be.

Labels: ,

Tuesday, February 06, 2007

Eyes and Ears

Although the online version of The Los Angeles Times doesn't yet contain a "technology" section, stories about digital rhetoric dominated today's front page. "Intrepid Armchair Explorers" describes the travel narratives of those who use the popular mapping service Google Earth to search for oddities recorded at the exact occasion of photographing a particular segment of topography.

"These are life's moments that are unexpectedly caught from above," said Jason Lee, 30, a Bellingham, Wash., marketer. He and computer programmer Jon Coogan run Bird's Eye Tourist, a website that compiles things of interest submitted by users of a Live Search Maps feature known as bird's eye view.

What may appear as a blemish to digital mapmakers is becoming sport for virtual discoverers. The hunt is on to find and share those moments.

The Google Earth Community and independent enthusiast sites such as Google Earth Blog, Google Sightseeing and Bird's Eye Tourist serve as repositories for these finds, where people can discuss, for example, a submarine captured in a permanent state of departure from Tokyo Bay (the bow-wake characteristics and sail-to-rudder measurement suggest it is a Yushio class sub, a Google Earth Community veteran concluded).

John Hanke, director of Google Earth and Maps, said the hunt for interesting things reminded him of the Web's early days, before search engines and directories.

"There's a huge amount of undiscovered territory out there for these geo-explorers to go and explore," he said.

The other digital rhetoric story continues the saga of embarrassing audio files accidentally posted on the web by the California governor's office. In "Subjects of Schwarzenegger's recorded jabs resist counterpunching," law-makers -- from both parties -- who were the subjects of nasty private comments are described as discounting the impact of the governor's opinions, given that they weren't intended to be taken as public speech. Largely legislators have responded with "no comment" or their best witty replies.

In other news, the Virtualpolitik home office celebrated passing the 500 posting mark last night.

Labels: ,

Monday, February 05, 2007

The Web is Us/ing Us



Unfortunately, this video essay doesn't really explore the implications of its title and the ways that "users" can be "used" by corporate and political interests. I'm also not sure about the way it presents XML as a way to separate "form" and "content" rather than see it as metadata, which it is. Moreover, this video raises a lot of issues about fundamental social change at the very end -- ironically some of the same issues deferred in the comforting pap of this corporate ad -- that don't really get airtime.

But as someone who has done a YouTube video essay and knows the rhetorical challenges of the medium, I think it is an impressive effort. Despite the virtuoso performance, however, the creator of this video describes it as a "draft." It is interesting also to note that this YouTube video essay is actually presented as a response to this one on Web 2.0. (Of course, this video has already generated its own response, which compares Web 2.0 to reality TV.)

Labels: ,

Friday, September 22, 2006

Paris in Washington

If you follow web statistics at all -- posted in places like Technorati or Google -- you may already be aware of this fun fact: for the past three years, "" continues to be one of the most common search terms in various top-ten lists that represent the words and phrases that are typed into computer interfaces by Internet users day after day and year after year.

If you follow Virtualpolitik, you may also be aware that I have an occasional feature about where particular, possibly embarrassing, terms appear on .gov sites. For example, in the past I have recorded the results from official government web pages for "masturbation" and "videogames." So I thought that it was time to try my luck with "Paris Hilton," and I have to admit I was surprised by the number of government agencies that had materials on the drunk-driving heiress, mediocre synth-pop songstress, and reality show TV star.

For example, there was a lot of talk on Capitol Hill about the "Paris Hilton tax break" by Barack Obama, Diana DeGette, Louise Slaughter, and Harry Reid about the Internet's favorite leggy blonde. It was smart on their part to ensure that there was either labeling or metadata so "Paris Hilton" would be picked up by search engines, given the number of accidental visitors that the term could bring to their sites.

Of course, fans of Intellectual Property Law can read about Paris Hilton's trademark application, to be associated with "fragrances, namely, perfumes, eau de parfum, cologne, eau de toilet, body lotion, bath gel, hand soap, perfumed soap and cosmetics."

A NASA website on women's issues even cites a now defunct Hilton web page, "parishiltonlife.com"! The National Center for Education Statistics, one of my favorite organizations since it addresses issues about writing proficiency, has a Hilton hit as well, which leads to an essay by a youngster.

Surprisingly many fields are represented in the search. The IRS commissioner expresses some implied envy over Hilton's web presence, while also taking pride in the web traffic for e-filing. A newsletter from the National Institutes of Health takes a swipe at Hilton's fad pet, a "kinkajou, whose natural habitat isn’t the nightclub." A science and engineering website uses a Yahoo article on Hilton as an example of security problems with "mobile gizmos."

There's also a lot of not-so-veiled sexism. For example, the Arizona Agriculture Department bestializes the celebrity: "If Paris Hilton were a horse, she would probably be an Arabian. If she passed the I.Q. test. Arabians are trim, delicate, beautiful and pampered as if they were celebrity jet-setters. But they're also said to be the smartest horses."

Labels: ,

Saturday, August 19, 2006

Snap Judgments

Clive Thompson posted some interesting reflections about Amazon.com's Mechanical Turk program, in which ordinary people -- often at their day jobs -- can earn money from piece work doing digital tasks that even sophisticated computers can't manage. In "Why Humans Have the Best Artificial Intelligence," Thompson explains how our capacity as humans to make simple snap judgments makes us better categorizers than high-tech indexing machines.

What I love about the Mechanical Turk is that it capitalizes on an interesting limitation in artificial intelligence: Computers suck at many tasks that are super-easy for humans. Any idiot can look at picture and instantly recognize that it's a picture of a pink shoe. Any idiot can listen to a .wav file and realize it's the sound of a dog baring. But computer scientists have spent billions trying to train software to do this, and they've utterly failed.

More generally, this metadata industry is booming in many parts of the country in which traditional industries have failed or natural resources have become exhausted. For example, in Chester, Vermont, I watched digital archivists patiently categorizing pictures and choosing between misspellings in ways that only a more efficient human can do. Unlike the concentrated work practices at the scrupulous Chester plant, however, Mechanical Turk encourages multitasking.

Thompson also notes the negative implications for collective bargaining created by programs like Mechanical Turk: "Mind you, while the cognitive-science aspects of the Mechanical Turk are incredibly cool, the labor dimensions freak the hell out of high-tech labor unions."

Of course, there are several online programs that celebrate our subjective tastes rather than objective observations. For example, you can vote for the cutest kittens at Kitten War (I love the "Losingest Kittens" category) or indicate the kind of sky you prefer at Cloud Shape Classifier. All this data can be aggregated so that you can see the aesthetic of the collective emerge or an ideal object of contemplation for a specific individual be generated.

Our online behaviors tell us a lot about our human vulnerabilities as well, as the recent AOL scandal involving the release of highly personal search terms shows us. The New York Times story, "A Face Is Exposed for AOL Searcher No. 4417749," makes manifest how simple it can be to work backwards to the individual from search terms, even without elaborate data mining tools. To see why this might be a problem for privacy, you can see some of the more embarrassing online searches immortalized at Something Awful. (Watch out, since some of the words and phrases aren't "work-safe.")

Labels: ,