Klub Katalog: digital libraries

Showing posts with label digital libraries. Show all posts

Monday, March 28, 2011

Siva Vaidhyanathan, The Googlization of Everything

I wrote this event summary for the DCLA "Capital Librarian" newsletter and thought I'd share it here as well. It was a very interesting talk, especially in light of the rejection of the Google Books settlement last week. The event was recorded, so hopefully the Library of Congress will post the video online soon.

Siva Vaidhyanathan spoke at the Library of Congress on March 25 about his new book, The Googlization of Everything, and his proposed “Human Knowledge Project.” He opened by discussing the recent rejection of the Google Books settlement by Judge Denny Chin. Judge Chin rejected the settlement not so much because of its content, but because a class action settlement was not the right venue to make sweeping policy decisions about copyright. Vaidhyanathan agreed that Congress should be making policy changes, not private parties negotiating through the court system.

Vaidhyanathan sees the rejection of the settlement as an opportunity for librarians. Now is the time to take stock of what libraries and users want and see if we can find a better way to achieve these goals. Vaidhyanathan’s new book, The Googlization of Everything, was initially inspired by the Google Books Project. Early on, he was uncomfortable with Google’s approach to copyright and worried that they were putting too much weight on the principle of fair use. It was inevitable that Google would be sued, and if their defense failed it could create a dangerous precedent that could threaten the very concept of fair use.

Throughout his presentation Vaidhyanathan stressed that companies aren’t stable, long-term organizations. They are usually very short lived and those that do survive undergo huge transformations. What will happen to the Google Books Project years from now as Google’s own priorities and values change? Why were stable, centuries old institutions like universities and libraries turning to a transitory company instead of scanning books themselves?

Google’s lofty mission statement is to “organize the world‘s information and make it universally accessible and useful.” But unlike libraries, archives, and museums, the Google Books Project was not undertaken for the public good. It is a project to improve Google’s profit margin and in reality it is a bookstore, not a library.

On balance, Vaidhyanathan thinks Google has a positive effect on the world and on our day-to-day lives. But we still need to think critically about our relationship with Google and its activities. Google is a good company, but it’s still a company. The Google Books Project is also generally a good idea, but it has many problems, such as low-quality scanning and bad metadata. Google also routinely violates core values of librarianship like user confidentiality.

Vaidhyanathan sees “Public Failure” as the root cause of the Google Books Project. State institutions fail when they are not given enough resources to carry out their roles in society. When the state fails, private companies step in, as has happened with privatized prisons and private charter schools. Our national system of libraries and universities didn’t have the resources to create a universal digital library, so Google stepped in to do it for us.

There is an opportunity now for libraries to step up and do it better through projects like the Open Book Alliance, the Hathi Trust, and the new Digital Public Library of America. Such a project are based on the core values of librarianship and the accumulated knowledge of the profession.

Vaidhyanathan hopes we’ll follow the model of the Human Genome Project. Initially attempts to map the human genome were publically financed, but underfunded and fragmented. Then the Celera Corporation announced that it planned to privately map the genome at a speed the public efforts could not match. Celera planned to patent gene sequences and use the genome for private profit. In response the scientific community mobilized politically and launched a massive global project to produce an open access genome that would be freely available to all researchers. The public sequence was published in the same week as the Celera sequence. Librarians need to mobilize and work together in the same way to create our own Human Knowledge Project. It will be a long-term effort, but Vaidhyanathan is optimistic that it will be possible and a true universal library can be created; one which will be publically financed, based on the core values of librarianship, and freely available worldwide.

Image from http://www.flickr.com/photos/alper/2805069373/ , under Attribution Non-Commercial CC License

Thursday, December 2, 2010

Improving Web Statistics

I've been looking into ways to increase search ranking and web stats for my digital archive. One really helpful presentation I found:

"Search Engine Optimization for Digital Collections" by Kenning Arlitsch, Patrick OBrien
and Sandra McIntyre.

The authors discuss the unique problems of digital libraries and ways to solve them. They explain how to use Google Webmaster Tools to check for webcrawler errors (definitely worth checking out if you run your own website!) and various technical ways to improve indexing.

We've also been brainstorming ways to get more use out of our institution's Twitter and Facebook accounts. They're both pretty active, which is great, but we're mainly just posting "This Day in Cold War History" links. I'd like to get us interacting more with our users, in hopes that that will increase our followers/visibility, and thereby the number of people being redirected to our actual website.

Related Links:

Andy Woodworth just put up an interesting post about using Facebook ads: Selling Myself. Literally.
bit.ly also just held an API creation contest, which lists many cool tools, including Your Twitter Trending Topics. It compares your number of bit.ly clickthroughs with the words in your tweets, helping you see which topics your followers click on the most.

Wednesday, November 24, 2010

'Cause Tonight is the Night

My digital archive uses Dublin Core and I’ve been looking into best practices. This led to the realization that we currently violate one of the central tenets of Dublin Core, the One-to-One Principle:

In general Dublin Core metadata describes one manifestation or version of a resource, rather than assuming that manifestations stand in for one another. For instance, a jpeg image of the Mona Lisa has much in common with the original painting, but it is not the same as the painting. As such the digital image should be described as itself, most likely with the creator of the digital image included as a Creator or Contributor, rather than just the painter of the original Mona Lisa.

Like many cultural heritage projects, my digital archive has cheerfully ignored the One-to-One Principle for years, combining metadata about both the digital file and physical original in a single record. I’m not planning to change this because--abstract principles aside--mixed records make more sense for both our users and our local situation.

In an article on current practice and the One-to-One Principle, Steven Miller of the University of Wisconsin gets to the heart of the problem for me:

…many practitioners, including those who are well aware of the One-to-One principle, come to their digital collection projects with the intent to create records only for their digital resources. They are creating metadata for an online collection of digital resources, not a database or catalog of both their analog holdings and their digitized files.

My archive doesn't even have real physical material (all of our documents are photocopies or scans from other archives), so why go to the trouble of creating two separate records for each item? Not to mention, double records would be a headache if we ever exposed our metadata for aggregators.

In the same article, Miller recommends a compromise solutions:

Follow the One-to-One Principle as much as possible, with the bulk of a record focusing on either the digital or the original,
use the source field to explain the relationship between the digital and original versions (i.e. “Digital reproduction of photographic print in the So-and-so Collection, located in the Such-and-such Archive.”)

He goes into more detail in the article, but that's the basic idea. This is similar to what we are doing now and I think I'll follow his suggestions, keeping in mind what our metadata records will look like when stripped down to simple Dublin Core.*

*One caveat: I’m not crazy about some of Miller’s DC mappings in his examples. For instance, in one he uses the "Contributor" field for the name of the institution holding the original physical document, which I don’t think is right. It makes much more sense in the Publisher or Relation field. See Arwen Hutt and Jenn Riley, “Semantics and Syntax of Dublin Core Usage in Open Archives Initiative Data Providers of Cultural Heritage Materials,” p. 6.

Monday, November 22, 2010

PDF/A Link Dump

I’m considering using PDF/A at my digital archive and thought I’d drop some useful links here for anyone else interested.

PDF/A is a new(ish) file format. It’s a long-term archival version of the classic PDF format we all know and love. Basically, it’s the same as regular old PDF, but it’s guaranteed to look exactly the same years from now when you open it on your holographic iPhone. It should be super easy to implement since the scanning software we currently use, Adobe Acrobat Pro, already has settings for scanning/converting to PDF/A.

PDF/A - A new Standard for Long-Term Archiving

White paper from the PDF/A Competence Center which explains the standard in easy-to-understand language.

PDF/A: A Viable Addition to the Preservation Toolkit

Report from Ohio State University Library which discusses different options for converting documents using Microsoft Word and Adobe Acrobat Pro.

Working with PDF/A in Acrobat 9

Great Adobe Acrobat Pro tutorial which explains exactly which features are and aren't PDF/A compliant. (Note: The narrator has a very soothing accent.)

FREE PDF/A Validator

Uses email to verify attached PDF/A documents.