Emily's DigIn Blog: April 2015

I have been treating my experience with each new website we have tried as an audition for the final project. This means that while it is important to consider which site is the best overall (most user friendly, most attractive, easiest to customize, etc), it is even more important to consider which fits my project the best. The site that I like the most might not be the best site for hosting my collection. Clearly, all of the sites that we’ve looked at have specific uses that suit them better than others. I can see the value in each of the resources we’ve examined this semester.

For example, I liked DSpace better than Drupal. I thought DSpace made sense, it was relatively easy to navigate, and it was great for hosting documents. However, my collection is made up of more than just documents—I also have images and audio files. After using Omeka this week, I feel like it is a better fit for my collection than DSpace. But then, thinking back to the beginning of the semester, Drupal is so customizable that it can really host any type of collection, so saying that Omeka is the best site to host my collection isn’t necessarily true. I think I just like Omeka because a lot of the work (such as adding Dublin Core metadata elements) has already been done for me, which makes importing items easy. Plus, I like to look and feel of Omeka more than any other hosting site we’ve used—it feels much more clean, organized, and modern.

I’ve already talked a bit about Omkea, so I think I’ll go in reverse chronological order to discuss the other sites. So, the EPrints harvester was an interesting resource. I have seen federated collections before, but I hadn’t ever given much thought as to how the collections were brought together. As many of my classmates said on the forum, it seems like making your archive harvestable was trendy a few years ago but has largely tapered off. I didn’t have any trouble with the harvester, and I thought the resulting collection was decent—very browseable, relatively searchable, and not too ugly or outdated looking.

Eprints itself was kind of a mixed bag—I didn’t hate it but I didn’t love it. Eprints was great for the few academic journal articles housed in my collection. The whole broken subjects aspect soured the Eprints experience for me—it was such a chore to enter even one item, and the resulting collection was missing an important part of metadata. I can imagine that Eprints is pretty decent when everything is working, but the resulting site is not as aesthetically pleasing or as intuitive as Omeka’s.

Dspace was pretty middle-of-the-road for me. I can see why a lot of universities use Dspace—it’s good with metadata and preservation, both of which are essential to institutional repositories. I thought Dspace was a bit less user friendly than Drupal, but it did fit the needs of my academically-focused collection with less effort on my part than Drupal. Before working with Eprints and Omeka, I thought Dspace was a contender for my final project.

I didn’t like Jhove—at the time I didn’t really understand what it was or why we were using it. One of the PDF’s I put into Jhove generated pages and pages of nonsense… Writing this post has made me realize I need to look at Jhove again so I can talk better about it for the final project.

Drupal is like a sandbox—everything is customizable, there are tons of cool add ons and things to play with, but you have to figure out the menu system before you can really play. While using Drupal I often got the feeling I know there is a way to do this, but I don’t remember where it is in the menu, which lead to lots of searching and guide reading. I imagine that people who are comfortable in Drupal like it, and I have no doubt that Drupal can create good digital collections, but I think there are better options to host my collection.

So at this point in time, its looking like I’m going to choose Omeka. We’ll see if working more with Omeka next week changes my perception at all.

This week, I looked at a few different service providers of databases/archives based on the OAI harvesting protocol. Some were good, some were bad, and it got me thinking about what makes a good and useful federated collection.

First, I took a look at Heritage West, an archive of digital objects related to the Western United States. I was interested in this federated collection because it combines objects from University and museum libraries with objects from local small historical societies and museums. I found Heritage West using The University of Illinois OAI-PMH Service Provider Registry, but the link on that site now connects to a blog about the merits of steel shooting targets. With a quick Google search I found the real Heritage West (http://heritagewest.coalliance.org/) which is a very simple looking Omeka based archive. Some of the links within this archive don’t appear to work—I couldn’t find anything using “Browse Categories” but I could view individual items using “Browse Items” and “Browse Collections”. Some of the individual items had extensive metadata but no way to view the item itself within its home collection. The Advanced Search function left a lot to be desired—I could only serach by keyword, collection, and one “Narrow by Specific Fields” dropdown menu where I could select DC or item type metadata and then write in what I wanted to search, which was very clunky and not intuitive (http://heritagewest.coalliance.org/items/advanced-search). Overall, the Heritage West did not fulfill its mission of giving me access to a bunch of different Western US resources in one place—I couldn’t see the items themselves (only metadata), the search was clunky and metadata was not consistent across collections, and some of the site’s ways of organizing the data no longer worked. It would be simpler to actually search the federated institutions’ websites than use Heritage West’s interface.

Next, I looked at UCLA’s Sheet Music Consortium (http://digital2.library.ucla.edu/sheetmusic/) which I found on the OpenArchives.org list. I can definitely see a need for a project like this, since digital sheet music can be hard to find and hard to verify that it is legal to use and correct, etc. The records within this provider did not have a lot of metadata (mainly title, creator, identifier, and name of library holding the resource). Interestingly, many of the records housed in this repository that purports to “promote access to and use of online sheet music collections” were to resources that are not available online. However, you can choose to browse the Virtual Collection (http://digital2.library.ucla.edu/sheetmusic/virtualcollection.html) that contains all digital music. There is a social element to the Virtual Collection—you can view other users’ collections of music. You can also check a box when searching to only return digitized sheet music fitting your search parameters. In general, the digitized music had much more thorough metadata than the record-only pieces, which makes sense because more metadata was likely added when the objects were digitized. Overall, I found the Sheet Music Consortium good and useful because it allows users to search records effectively, the actual digital objects are accessible through the site, and the metadata for the digitized music was consistent and thorough enough to allow for productive searches.

Finally, I looked at the NASA Technical Reports Server (NTRS) (http://ntrs.nasa.gov/). This site was generally good—the search and advanced search options were very customizable and well-developed, plus there was a “Search Tips” section that provided information on how to search and refine searchers, which helps with accessibility (http://www.sti.nasa.gov/ntrs-search-tips/#.VSiMwJTF8ww). NASA has their own metadata terms that they have added to records, so its good there is a document that helps to explain them. Their “Browse By” page was also good—lots of categories to pick including date based categories, document type, center that houses the information, and availability. In terms of metadata for records, I looked at some PhD dissertations that had very extensive metadata, a few computer programs that also had good metadata, and some datasets that had decent metadata. The problem I had with the dataset records was that they contained a lot of good information, but it wasn’t well categorized, which could create problems when searching. Overall, NTRS was a very thorough and well organized provider of NASA technical reports. I found it to be a useful tool because the metadata for records was extensive, the search interface was easy to use (especially the search tips!), and the browse section was well organized.

Through this exercise, I found that I really like it when you can search by availability, because I don’t usually use an online repository or database to just find out that something exists, I want to see the thing!

Huge federated collections are good in that they provide “one stop shopping” for a lot of different resources at once. They are less good when they become less cohesive in terms or subject matter, or when metadata between the different collections is not consistent, which makes searching less effective.

Emily's DigIn Blog

Friday, April 17, 2015

Unit 12--One Content Management System to Rule them All

Friday, April 10, 2015

Unit 11: OAI Metadata Harvesting Services