It has been a while since I have written about the library OPAC. I think that as 21st Century information professionals, we need to reevaluate what we have and where we are going.
I envision the entry page to a library OPAC to look much the same as Google: an empty text box just waiting for the patron’s request. The OPAC I envision has a link underneath providing “advanced” searching – the more familiar keyword or Boolean searches. The default, however is that the patron enters his free-form question in the box, and the software is capable of taking that question, translating it into terms that the software can use to search all available library resources, and then displaying a list of materials within the library collection (including books, serials, and any digital material) that intelligently provide the information the patron requests.
This is what the patrons expect. Now they find the information they want elsewhere, then come to the library OPAC only once they have defined the topic they need themselves, or even the exact author or book! This is not how it should be. We are information professionals. We can do better.
Most library OPACs are light-years behind Google, Bing, or even Amazon’s search. Is this because Marc is antiquated? Is this because cataloging is a waste of time? Or is there another issue?
Google works by crawling, over and over, the pages on websites all over the world. Using its Page Rank algorithm, it stores information about these pages, ranking them using key terms and incoming links. The math involved is quite involved – I recommend anyone who is interested in how Google works read Google’s PageRank and Beyond by Langville & Meyer. The first several chapters lay a good foundation for understanding it. If you love math read the rest. But in short, Google creates its own advanced metadata set, from which it produces a set of pages for every search that is entered. If I search for chihuahuas, Google doesn’t go and search the web right that minute. The search the matrices stored in Google’s databases. It is fast, and efficient. And fairly accurate….
Webpages are not the same as a bibliographic collection. There are no incoming links to indicate a popularity or relevancy score. An OPAC relies on a different sort of metadata. Bibliographic records – good ones with accurately chosen subject headings (and improvements such as FRBR) – are the metadata that an OPAC has to work with.
But Susan, you say, that is what we have, and, well, the OPAC doesn’t function as you envision. Yes, that is exactly my point.
The problem with the OPAC is not the bibliographic metadata. In this digital information environment, good cataloging is more important than ever, authority work in particular. It is the software that we are using to search it that is the problem.
OPAC software basically operates as a retrieval mechanism. There is no intelligence built into it. It searches by the access points, or by keyword. Some software packages allow patron provided tags. But if the word you are using is not in the search index, you are out of luck. If the word you are searching has multiple meanings, you also might not find what you were expecting.
Also, if the subject you want is not covered by a physical book, but by an article in an online database, you are likely out of luck as well. You would have to go to a different catalog to search those. The library system in which I work has, for example, an OPAC to search the physical bibliographic records, and another one for the digital books (ePubs, etc.). Links to the digital pubs are manually entered into the main OPAC so that they also appear in the search results. Wouldn’t it be better if the software could do both searches simultaneously? What about the online journals and encyclopedias to which our library system subscribes? The limitation is not that software can’t be written to do so efficiently, but the specific software we use is not. Most commercially available library OPAC software is vendor specific. The software I propose would search all of the library’s resources and create search matrices. It would be those search matrices that would be assessed at the time of the specific search. Nightly the software could search changed records and add the data to the search matrices. There could also be some method for making changed records instantly available for search – which would be an improvement on Google.
Marshall Breeding discusses the issue of relevancy ranking in the 21st Century OPAC.
It seems that most online catalogs and discovery systems attempt to present their search results according to relevancy. That’s what library users expect since most of the search environments they use likewise follow this approach. Yet, it’s really difficult to make relevancy work well for library collections, especially when intermingling results that include books, articles, and other kinds of materials.
While I am not a metadata expert, Marc is not the real issue. As long as the Marc records are standard, a software engineer can design software to search it. It is metadata. As I understand it, the issue with Marc is more an issue of sharing data with entities other than libraries. Current commercial library software creates indexes of the access points, so, in my view, Marc in and of itself is not an issue for information findability for the library OPAC.
In Finding the Concept, Not Just the Word: A Librarian’s Guide to Ontologies and Semantics, using the language of a librarian King & Reinold discuss how ontologies and concept maps can be constructed in developing software that can do just what I envision. I recommend all librarians interested in search principles read this.
It, however, is NOT a software manual (nor do they claim that it is). But it will help librarians to CATCH THE VISION – a vision of an OPAC that will function as the librarian does at a reference interview, and deliver accurate results that answers the patron’s question.
The software solution will involve quite advanced data structures and algorithms. But Google and others demonstrate that it is possible. An impressive attempt is the eXtensible Catalog. The eXtensible Catalog is an open source software project that has the potential of meeting the needs of the 21st Century OPAC.
The eXtensible Catalog (XC) is open source, user-centered, next generation software for libraries. XC provides an alternative discovery interface for users and a set of tools for libraries to manage metadata and build applications. XC comprises four software toolkits that can be used independently to address a particular need or combined to provide an end-to-end discovery system to connect library users with resources.
This is the direction the 21st Century library MUST go to effectively serve the 21st Century patron.