1、OAI Data Providers http:/gita.grainger.uiuc.edu/registry/Stanford-2006-08-24,By Thomas G. Habing thabinguiuc.edu Grainger Engineering Library Information Center University of Illinois at Urbana-Champaign,Outline,Brief Overview of OAI-PMH Anatomy of an OAI Data Provider OAI Static Repositories UIUCs
2、OAI FileMakerPro Gateway Other Tools Validating,Overview: OAI-PMH,http:/www.openarchives.org/ Technologies (RESTful Web Service) HTTP URIs XML Mostly stateless,Overview: Definitions and Concepts,Harvester (client that issues OAI-PMH requests) Repository (server that responds to OAI-PMH requests) Ite
3、ms (OAI Identifier) contain metadata about a resource Records (OAI Identifier + Metadata Prefix) contain metadata in a specific format about a resource Selective Harvesting Sets Datestamps From and Until Dates,Overview: Metadata,Metadata Dublin Core is required (oai_dc) Many others (MODS, MARC, Qual
4、ified DC, etc.) Adoption of richer metadata formats is highly encouraged, especially within communities Can be used for complete digital resources, not just metadata,Overview: Verbs,Find out about the repository ?verb=Identify ?verb=ListSets ?verb=ListMetadataFormats&identifier=iii Harvest records ?
5、verb=ListIdentifiers&metadataPrefix=mmm &from=yyyy-mm-dd&until=yyyy-mm-dd&set=sss ?verb=ListRecords&metadataPrefix=mmm &from=yyyy-mm-dd&until=yyyy-mm-dd&set=sss ?verb=GetRecord&metadataPrefix=mmm&identifier=iii,Examples from the Library of Congress OAI Data Provider,Overview: Flow Control,Resumption
6、 Tokens ?verb=ListSets&resumptionToken=rrr ?verb=ListIdentifiers&resumptionToken=rrr ?verb=ListRecords&resumptionToken=rrr HTTP 503 Service Unavailable (Retry-After),Overview: HTTP,302 Found (Location) Compression Authentication,Anatomy of an OAI Data Provider,How are OAI responses generated? Static
7、 OAI responses are fed from a static copy of your records; the static copy is periodically updated from your live data (daily, weekly, monthly, irregularly, etc.) Staleness, minimal impact on your production system, may be amenable to certain turnkey solutions, easier to implement Dynamic OAI respon
8、ses are generated directly from your live data Up-to-date, may impact production system, must be tightly integrated to production system, may be difficult to implement depending on your current systems and workflows,Anatomy of an OAI Data Provider,Where do the various components reside? Locally OAI
9、data provider is on same server as the data, may be part of a larger monolithic system like DSpace or contentDM. Distributed OAI data provider is on different server than the data or data management system, may even be administered by a different organization,Anatomy of an OAI Data Provider,Options
10、Turnkey system that already has OAI-PMH capabilities built-in, such as DSpace or contentDM, plus many others. Can be limiting Start with an OAI-PMH toolkit and customize it to fit your needs, OCLCs OAICat (Java), various toolkits from UIUC (ASP) or Virginia Tech (perl), and many others Build a data
11、provider from scratch, not too difficult for a proficient web software developer Use a gateway service, such as an OAI Static Repository Gateway, Emorys Metadata Migrator, UIUCs FileMakerPro and Z39.50 gateways.,OAI Static Repositories The Problem,OAI-PMH is simple, but not simple enough for: Techni
12、cally challenged organizations Limited resources No control over their web server With small collections 1-5000 records (10-20 MB XML File) That do not change often This is a pretty loose requirement (weekly?),OAI Static Repositories The Solution,Static Repository A single XML file containing all me
13、tadata, identifiers, and datestamps Accessible from a web server via an HTTP URL, such as http:/host:port/path/file.xml May be created manually by an XML or simple text editor, or programmatically Static Repository Gateway Provides intermediation for one or more Static Repositories,OAI Static Reposi
14、tories Official Specification,http:/www.openarchives.org/OAI/2.0/ guidelines-static-repository.htm,OAI Static Repositories Illustration,Static Repositories,Static Repository Gateway,http:/this.edu/col1/oai.xml,http:/that.org/mycol/col.xml,OAIster,OAI Harvesters,reap,http:/myoai.org/oai/that.org/myco
15、l/col.xml?verb=.,http:/myoai.org/oai/this.edu/col1/oai.xml?verb=.,http:/myoai.org/oai,OAI Static Repositories Static Repository Limitations,Must be a single XML file (mime: text/xml) No resumptionTokens Must be UTF-8 encoded Unicode http:/www.cs.cornell.edu/people/simeon/software/utf8conditioner/ Mu
16、st validate against Static Repository XML Schema The baseURL element must be the concatenation of the Static Gateway URL and the Static Repository URL ListRecords elements must conform to the OAI-PMH record format,OAI Static Repositories Additional Limitations,The URL of the Static Repository XML fi
17、le cannot include a fragment or query string Sets are not supported Deleted records are not supported Response compression is not supported Only YYYY-MM-DD date stamp granularity is supported The guidelines for OAI identifiers should be followed: http:/www.openarchives.org/OAI/2.0/guidelines-oai-ide
18、ntifier.htm,OAI Static Repositories Static Repository XML Sections, ,OAI Static Repositories , Demo http:/myoai.org/oai/this.edu/col1/oai.xml 2.0 jondoeoai.org 2002-09-19 no YYYY-MM-DD ,OAI Static Repositories , oai_dc http:/www.openarchives.org/OAI/2.0/oai_dc.xsd http:/www.openarchives.org/OAI/2.0/
19、oai_dc/ ,OAI Static Repositories , oai:this.edu:123456 2001-12-14 Some Title ,UIUCs OAI FileMakerPro Gateway,FileMakerPro Databases,OAI FileMaker Gateway,http:/some.edu:591/FMPro?-db=artifacts&.,http:/this.org:591/FMPro?-db=collection&.,OAIster,OAI Harvesters,reap,http:/myoai.org/oai.aspx/collection
20、?verb=.,http:/myoai.org/oai.aspx/artifacts?verb=.,http:/myoai.org/oai.aspx,OAI FileMakerPro Gateway The Problem,FMP has widespread use in the museum community and is often used for special collections in libraries Until recently there are no easy or convenient tools for making FMP databases OAI acce
21、ssible Could use Emorys Metadata Migrator (or similar tools), but there could be latency problems if the database was active.,OAI FileMakerPro Gateway Solution,Out of the box, FMP has a built-in web server and can export XML http:/ This facilitates a solution similar to OAI Static Repositories Excep
22、t it is not static; data is being fed directly from the database and not from a static copy This is a slight fib: because of how datestamps are derived they only have a ganularity of one day, so an incremental harvest might be up to 24 hours out of date,OAI FileMakerPro Gateway Some Technical Detail
23、s How to Get XML From FMP,http:/base_url:591/FMPro ?-db=database &-lay=layout &-format=format &-max=max_records &-skip=skip-records &-recid=record_id &-command,-lay=short layout | full layout for ListIdentifiers | ListRecords -format=-fmp_xml -dso_xml (easier to transform)-find -dbnames -layoutnames
24、 -etc,OAI FileMakerPro Gateway More Technical Details,FMP XML Formats The -dso_xml format: Easier to transform with XSLT But may be malformed in some cases (the gateway can accommodate this) The XML Schema varies by database Same as XML export format used by MS SQL Server The fmp_xml format: Always
25、the same XML Schema regardless of the database Difficult to transform,OAI FileMakerPro Gateway More Technical Details,Datestamps All FMP records have a RECORDID and a MODID The MODID increments each time the record is changed, thus it can be used as a surrogate for the datestamp When a new FMP datab
26、ase is added to the Gateway, all RECORDID and MODID are recorded locally, and each record is assigned the current date for the datestamp. Once a day, the MODID of each record are compared against the locally stored value, and the datestamp of the record is set to the current date if the MODID has ch
27、anged.,OAI FileMakerPro Gateway Configuring the Gateway,OAI FileMakerPro Gateway Covert Implementations,It is relatively easy to identify and intermediate FMP databases using the Gateway. Use Google to Find them: http:/ Gather configuration details like layouts, etc. Write an XSLT to transform dso_x
28、ml into oai_dc Most FMP database owners probably dont even realize how easy it is for someone to perform a wholesale download of their entire database Good for OAI implementers, But FMP database owners, be careful of sensitive data! Make sure the web-based edit features are secured!,OAI FileMakerPro
29、 Gateway An Invitation,http:/cicharvest.grainger.uiuc.edu/fmpgateway/We are looking for FMP collections we can test with the GatewayWe do plan to maintain the Gateway, similar to our OAI Static Gateway,Other OAI Gateways,z39.50 OAI-PMH http:/frasier.library.uiuc.edu/research.htm ZMARCO http:/ SRU/W
30、OAI-PMH http:/www.dlib.org/dlib/february05/sanderson/02sanderson.html,Open Source OAI Toolkits,OCLC http:/www.oclc.org/research/projects/oai/default.htm UIUC Grainger Engineering Library http:/uilib- Virginia Tech DLRL Projects http:/www.dlib.vt.edu/projects/OAI/ Lots of other Open Source tools http
31、:/ http:/www.openarchives.org/tools/tools.html,OAI Turnkey Solutions,Adlib CWIS ContentDM Digitool DLESE DLXS DSpace,http:/comm.nsdl.org/download.php/482/handout3.doc,EPrints Encompass Fedora Greenstone Ockham Others,How to Test Your OAI Provider,Repository Explorer http:/re.cs.uct.ac.za/ Good start
32、, but does not do a complete harvest, nor does it check non-oai_dc metadata formats, so cant find all problems W3C Validator for XML Schema http:/www.w3.org/2001/03/webdata/xsv Great for pinpointing obscure XML Schema validation errors or character encoding problems Only one request at a time though
33、 Character Encoding Problems http:/www.cs.cornell.edu/people/simeon/software/utf8conditioner/ Try to harvest your OAI provider yourself Use REAP, the Windows command line OAI harvester from UIUC http:/gita.grainger.uiuc.edu/registry/dlffall2005/reap_readme.htm Use the U. Michigan Harvester (Kat can provide more detail) Ask one of us to do it ,