1、NISO RP-2006-02 NISO Metasearch Initiative Metasearch XML Gateway Implementers Guide Version 1.0 A Recommended Practice of the National Information Standards Organization Standards Committee BC / Task Group 3 August 7, 2006 Published by the National Information Standards Organization Bethesda, MD Me
2、tasearch XML Gateway Implementers Guide 2006 NISO i Contents Foreword ii 1 Purpose and Audience. 1 2 Overview 1 3 Levels of Implementation. 2 4 Prerequisites 3 5 Decision Points 3 6 The MXG Protocol . 4 6.1 MXG URL Request 4 6.1.1 Syntax . 4 6.1.2 MXG Request Parameters 4 6.1.3 Parsed Examples 5 6.1
3、.4 Result Set IDs . 6 6.2 MXG XML Response . 6 6.2.1 MXG Response Parameters . 7 7 Compliance 12 7.1 URL Request Compliance 12 7.2 MXG Response Compliance 13 8 Advanced Interoperability (Levels 2 and 3) . 13 8.1 Explain (Level 2 Compliance) 13 8.2 CQL (Level 3 Compliance). 14 Appendix A : Implementa
4、tion Help 15 Appendix B : Resources 16 Appendix C : Glossary 18 Tables Table 1: MXG request parameters5 Table 2: MXG response header subelements 7 Table 3: MXG response result set elements.8 Table 4: MXG response record elements .9 Table 5: MXG response browser elements.9 Table 6: MXG response diagn
5、ostic elements .10 Figures Figure 1: High level model of metasearch environment .1 Figure 2: MXG protocol model 2 Metasearch XML Gateway Implementers Guide 2006 NISO ii Foreword Metasearchalso called parallel search, federated search, broadcast search, and cross-database searchhas become commonplace
6、 in the information communitys vocabulary. All speak to a common theme of allowing search and retrieval to span multiple databases, sources, platforms, protocols, and vendors at once and integrate the results. Metasearch services rely on a variety of approaches to search and retrieval including open
7、 standards (such as NISOs Z39.50), proprietary APIs, and screen scraping (extracting information from HTML responses). However, the absence of widely supported standards, best practices, and tools makes the metasearch environment less efficient for the metasearch service provider, the content provid
8、er, and ultimately the end-user. NISO Metasearch Initiative To move toward industry solutions, NISO sponsored a Metasearch Initiative to enable: metasearch service providers to offer more effective and responsive services content providers to deliver enhanced content and protect their intellectual p
9、roperty libraries to deliver services that distinguish their services from Google and other free web services. The groundwork for NISOs Metasearch Initiative was laid in two important events: A two day strategy meeting in May 2003 defined the metasearch state-of-the-art and built consensus on ways t
10、o move forward. A Metasearch workshop in October 2003 informed librarians, content providers, and aggregators about metasearch. Following these meetings, NISO established three Task Groups / Standards Committees to address the different Metasearch needs areas: Access Management (Standards Committee
11、BA / Task Group 1) Collection and Service Descriptions (Standards Committee BB / Task Group 2) Search and Retrieval (Standards Committee BC / Task Group 3) Search and Retrieval Task Group The Search and Retrieval Task Group was charged with identifying and/or developing standards and best practices
12、to improve interoperability between metasearch services and content providers. In particular, they were asked to investigate and report on: Current practices in metasearching search and retrieval Common metasearch vocabulary and terms Result set data elements XML interfaces Best practices for metase
13、arch search and retrieval This document represents one of the deliverables of the Search and Retrieve Task Group. It describes how to implement a Metasearch XML Gateway (MXG) that will allow a content providers resource to be accessed by a Metasearch Service and included in the results of a metasear
14、ch. Background on SRU In the late 1990s, there was significant interest in making the Z39.50 standard more relevant to the Web environment. The result was the development of a new specification: SRU (Search / Retrieve via URL) Metasearch XML Gateway Implementers Guide 2006 NISO iii The specification
15、 retains such Z39.50 concepts as result sets, abstract record schemas, application level diagnostics, and “Explain“, but differs from Z39.50 in the use of Web interfaces, XML, and the Common Query Language (CQL). The Metasearch Initiative Search / Retrieve Task Group selected SRU as the foundation o
16、n which to build the NISO Metasearch XML Gateway (MXG). For further information about SRU and CQL, see the Resources Appendix. Metasearch Search and Retrieve Task Group Members of the Task Group included: Katherine Kott, co-chair Digital Library Federation Sara Randall, co-chair Endeavor Information
17、 Systems Amira Aaron Harvard University Library Paul Cope Auto-Graphics, Inc. Ray Denenberg Library of Congress Dana Dietz OCLC, Inc. Matthew Dovey University of Oxford Susan Fariss National Library of Medicine Riccardo Ferrante Smithsonian Institution Archives Matthew Goldner OCLC, Inc. Cary Gordon
18、 The Cherry Hill Co. Renny Guida Thomson Scientific Sebastian Hammer Index Data Mary Jackson Association of Research Libraries Marc Krellenstein Elsevier Ralph LeVan OCLC, Inc. John Little Duke University Mike McKenna California Digital Library Ron Miller The H.W. Wilson Company William Mischo Unive
19、rsity of Illinois, Urbana-Champaign Peter Murray OhioLink Peter Noerr MuseGlobal, Inc. Audrey Novak Yale University Andrew Pace North Carolina University Oliver Pesch EBSCO Information Services Chris Roberts Ex Libris, Inc. Simona Rollinson Follett Software Co. Robert Sanderson University of Liverpo
20、ol Ezra Schwartz ArtandT Jeff Steinman Lexis Nexis Academic only the third level is fully compliant SRU. Full (Level 3) compliance is strongly encouraged for all implementers as interoperability and functionality increase with each level of implementation. Level 1 defines a standard URL which will a
21、ccommodate any query syntax or language. With Level 1 compliance, the Metasearch Service will have to convert its users queries to the Content Providers native search language. The amount of customization required depends on how proprietary the search language is. Level 2 extends Level 1 by adding t
22、he requirement that servers provide an SRU Explain record to define the capabilities of the server. With Level 2 compliance, the Content Providers server would provide an XML-formatted record that includes information about the resource, such as its host name and port and the database name, which is
23、 used as the context part of a URL. Metasearch XML Gateway Implementers Guide 2006 NISO 3 Example: The base URL for an MXG search against this resource: host=oclc.org port=80 database=search/WorldCat would be: http:/oclc.org/search/WorldCat Optionally, a Level 2 compliant server also includes human-
24、readable information about the database(s) on the server such as its name, a database description, information about the indexes available for searching the database, and the schemas that can be used to display returned records. This server / database description utilizes the SRU Explain operation.
25、See the Resources Appendix for background information on SRU Explain. Section 8.1 in Advanced Interoperability further defines the use of the Explain function in Level 2 of MXG. Level 3 extends Level 2 by adding the requirement that servers support a standard query grammar: CQL (Common Query Languag
26、e). “CQL tries to combine simplicity and intuitiveness of expression for simple, every day queries, with the richness of more expressive languages to accommodate complex concepts when necessary.“2Support for CQL eliminates the need for any customized search interface, which could make the content mo
27、re widely available to services that dont have the resources to write custom interfaces. See the Resources Appendix for background information on CQL. Section 8.2 in Advanced Interoperability further defines the use of CQL in Level 3 of the MXG. 4 Prerequisites There are very few prerequisites for i
28、mplementing MXG. Obviously you will need to have an electronic content resource that is Internet accessible. This resource must have: a Web-addressable Uniform Resource Identifier (URI), a server that can receive and parse a URL request into its components parts: the base URL and parameter names and
29、 values, an existing search interface, and the ability to output search results in XML. If you do not currently output search results in XML format but do have some type of API to a web interface for displaying records, then implementing the XML MXG response should be fairly easy. For further inform
30、ation on XML, see the Resources Appendix. 5 Decision Points Prior to implementing the MXG, you will need to make two decisions: 1. What level of compliance will you implement? See the Levels of Implementation section above for the description of each compliance level. While Level 1 requires the leas
31、t amount of standardized protocol use and is the easiest for the Content Provider to implement, it requires the Metasearch Provider to have or create a custom interface to your search language. Any changes to your search language could necessitate coordinated changes by the Metasearch Provider to en
32、sure that your content continues to be retrieved accurately. The 2Common Query Language, CQL Version 1.1 13th February 2004. Metasearch XML Gateway Implementers Guide 2006 NISO 4 need for this customization may limit the number of metasearch providers that access your content. If implementing the ga
33、teway at Level 1, a good practice would be to provide technical information on your website about your search language and API or, at a minimum, identify a contact person for Metasearch Providers who are interested in accessing your content. Levels 2 and 3 assume that you already have in place addit
34、ional features of the SRU specification. Level 2 requires you to have implemented the SRU Explain operation and Level 3 requires support for the Common Query Language (CQL). You will need to add this functionality, if you arent currently using Explain or CQL, to support Levels 2 and 3 of MXG. By imp
35、lementing these higher levels, you will make your resources more easily accessible to a greater number and variety of Metasearch Providers. Additionally, you can make changes to your own search language and update its “translation“ to CQL without involving any Metasearch Providers. You wont have to
36、coordinate your software change schedules with the Metasearch Providers and will have greater control over the accurate translation of search queries. 2. What XML schema will you utilize for records that are returned? A minimum of one XML Schema is required, although multiple schemas may be supporte
37、d for different Metasearch Providers or different communities of users. Any schema is allowable, even a custom created one as long as it uses standardized XML mark-up and is used consistently. A standard schema, such as Dublin Core, is one possible choice, although some content may require a more so
38、phisticated schema. Your choice of schema should be based on the contents attributes and the user community. For example, highly bibliographic content may find the MODS (Metadata Object Description Standard) schema useful, while an e-learning community may like the LOM (Learning Object Metadata) sch
39、ema. For further information on XML schemas, see the Resources Appendix. 6 The MXG Protocol The MXG protocol consists of a Request made by the Metasearch Service (MS) to the Content Providers (CP) resource and a Response from the CP to the MS. The Request is in the form of a simple SRU URL and the R
40、esponse is an instance of an SRU/SRW searchRetrieveResponse. For background information on SRU/SRW, see the Resources Appendix. 6.1 MXG URL Request 6.1.1 Syntax The MXG URL consists of an SRU base URL and a search part, separated by a question mark, in the form: http:/host/context? it is mandatory.
41、The second line is the actual searchRetrieveResponse. It includes a namespace3attribute that specifies the default namespace for this element and all subelements. The first subelement indicates the version of SRU. It is followed by the numberOfRecords subelement. Table 2 describes these subelement p
42、arameters. Table 2: MXG response header subelements Element Value Requirement Note 1.1 mandatory Specifies the version of SRU being utilized, which is currently version 1.1. a non-negative integer mandatory Specifies the count of the number or records that satisfies the query. If the query fails thi
43、s will be 0. 6.2.1.2 Result Set Elements The next group of elements in the MXG XML Response describes the result set and takes the form: 717zar 30 If the CP server generates result sets that can be referenced after the query is complete, then this is where it will specify the identifier for the resu
44、lt set and indicate how long the result set will remain available. Table 3 describes the parameters of these elements. 3A namespace provides context for identifiers. The same identifier can have different meanings in different namespaces. XML namespaces are defined in the W3C Recommendation, Namespa
45、ces in XML 1.1, available from: http:/www.w3.org/TR/xml-names11/. Metasearch XML Gateway Implementers Guide 2006 NISO 8 Table 3: MXG response result set elements Element Value Requirement Note a string optional An identifier for the result set. It is created at the time of execution of the query. It
46、 can contain anything that is valid in XML content. (Avoid angle brackets, quotes, apostrophes, and ampersands.) a positive integer optional The number of seconds from last use after which the created result set will be deleted. If omitted, it means that the server is not making any promises about h
47、ow long the result set will be available. Section 6.1.4 describes how the resultSetId is used in the MXG URL Request. The is essentially a countdown timer that is started each time the result set is used. When it reaches zero, the result set can be thrown away by the CP server. If the MS client want
48、s to prevent the resultSetId from expiring, it can send a request with maximumRecords=0, which will restart the timer. A result set idle time is not a guarantee; it is a promise of best effort. The server is always permitted, as necessary, to throw result sets away arbitrarily. If a result set that
49、no longer exists is later referenced, then the CP server should issue a diagnostic. (See section 6.2.1.5 for more information on diagnostics.) 6.2.1.3 Record Elements The next group of elements in the MXG XML Response describes the records and takes the form: info:srw/schema/1/dc-v1.1 xml rrl1234 Dog and Cat 1 Table 4 describes the parameters of these elements. Metasearch XML Gateway Implementers Guide 2006 NISO 9 Table 4: MXG response record elements Element Value Requirement Note N/A mandatory A wrapper element fo