Friday, April 10, 2015

Unit 11: OAI Metadata Harvesting Services

This week, I looked at a few different service providers of databases/archives based on the OAI harvesting protocol. Some were good, some were bad, and it got me thinking about what makes a good and useful federated collection.

First, I took a look at Heritage West, an archive of digital objects related to the Western United States. I was interested in this federated collection because it combines objects from University and museum libraries with objects from local small historical societies and museums. I found Heritage West using The University of Illinois OAI-PMH Service Provider Registry, but the link on that site now connects to a blog about the merits of steel shooting targets. With a quick Google search I found the real Heritage West (http://heritagewest.coalliance.org/) which is a very simple looking Omeka based archive. Some of the links within this archive don’t appear to work—I couldn’t find anything using “Browse Categories” but I could view individual items using “Browse Items” and “Browse Collections”. Some of the individual items had extensive metadata but no way to view the item itself within its home collection. The Advanced Search function left a lot to be desired—I could only serach by keyword, collection, and one “Narrow by Specific Fields” dropdown menu where I could select DC or item type metadata and then write in what I wanted to search, which was very clunky and not intuitive (http://heritagewest.coalliance.org/items/advanced-search). Overall, the Heritage West did not fulfill its mission of giving me access to a bunch of different Western US resources in one place—I couldn’t see the items themselves (only metadata), the search was clunky and metadata was not consistent across collections, and some of the site’s ways of organizing the data no longer worked. It would be simpler to actually search the federated institutions’ websites than use Heritage West’s interface.

Next, I looked at UCLA’s Sheet Music Consortium (http://digital2.library.ucla.edu/sheetmusic/) which I found on the OpenArchives.org list. I can definitely see a need for a project like this, since digital sheet music can be hard to find and hard to verify that it is legal to use and correct, etc. The records within this provider did not have a lot of metadata (mainly title, creator, identifier, and name of library holding the resource). Interestingly, many of the records housed in this repository that purports to “promote access to and use of online sheet music collections” were to resources that are not available online. However, you can choose to browse the Virtual Collection (http://digital2.library.ucla.edu/sheetmusic/virtualcollection.html) that contains all digital music. There is a social element to the Virtual Collection—you can view other users’ collections of music. You can also check a box when searching to only return digitized sheet music fitting your search parameters. In general, the digitized music had much more thorough metadata than the record-only pieces, which makes sense because more metadata was likely added when the objects were digitized. Overall, I found the Sheet Music Consortium good and useful because it allows users to search records effectively, the actual digital objects are accessible through the site, and the metadata for the digitized music was consistent and thorough enough to allow for productive searches.

Finally, I looked at the NASA Technical Reports Server (NTRS) (http://ntrs.nasa.gov/). This site was generally good—the search and advanced search options were very customizable and well-developed, plus there was a “Search Tips” section that provided information on how to search and refine searchers, which helps with accessibility (http://www.sti.nasa.gov/ntrs-search-tips/#.VSiMwJTF8ww). NASA has their own metadata terms that they have added to records, so its good there is a document that helps to explain them. Their “Browse By” page was also good—lots of categories to pick including date based categories, document type, center that houses the information, and availability. In terms of metadata for records, I looked at some PhD dissertations that had very extensive metadata, a few computer programs that also had good metadata, and some datasets that had decent metadata. The problem I had with the dataset records was that they contained a lot of good information, but it wasn’t well categorized, which could create problems when searching. Overall, NTRS was a very thorough and well organized provider of NASA technical reports. I found it to be a useful tool because the metadata for records was extensive, the search interface was easy to use (especially the search tips!), and the browse section was well organized.

Through this exercise, I found that I really like it when you can search by availability, because I don’t usually use an online repository or database to just find out that something exists, I want to see the thing!


Huge federated collections are good in that they provide “one stop shopping” for a lot of different resources at once. They are less good when they become less cohesive in terms or subject matter, or when metadata between the different collections is not consistent, which makes searching less effective.

No comments:

Post a Comment