1、Digital Library Architecture: A Service-Based Approach,Sandra Payette Department of Computer Science Cornell University payettecs.cornell.edu,Mo i Rana, Norway November 10, 1998,http:/www2.cs.cornell.edu/payette/presentations/DL-architecture.ppt,Overview,Why talk about DL architecture? Digital Libra
2、ries - the architectural perspective Review of service-based architecture NCSTRL - a working example Dienst - existing service-oriented architecture Cornell next generation (component-oriented) Conclusion,Why Talk about Digital Library Architecture?,Web alone is not a digital library Commercial pack
3、ages limited limited flexibility standards issues network-enabled applications not DL architecture Must position for broader DL opportunities,Web by itself not a DL Architecture,Documents - Files, CGI, MIME-Types Naming - URLs Document Servers - HTTP servers Resource Discovery - web crawlers Collect
4、ions - web pages, ad-hoc IP - Access Control List, passwords, ad-hoc,WWW Infrastructure Evolving,Resource Description Framework (RDF) will allow rich metadata semantics for documents http:/www.w3.org/RDF/ Extensible Markup Language (XML) will allow highly structured documents and rich linking (relat
5、ionship) capabilities http:/www.w3.org/XML/ Uniform Resource Names (URNs) will allow for persistent, globally unique identifiers,But still need Digital Library Architecture,Richer document model - digital objects Persistent, unique naming - URNs Well-defined digital library services Better facilitie
6、s for resource discovery Flexible definition of collections Management of distributed content & services Rights management for intellectual property,Digital Library Interoperability,Digital Library Architecture: Key Principles,Open Architecture functionality partitioned into set of well-defined serv
7、ices services accessible via well-defined protocol Modularization promotes interoperability scalable to different clientele (research library, informal web) Federation enable aggregations into logical collections Distribution of content (collections) and services of administration and management of
8、DL,Component-Ware Digital Libraries,Digital Objects,NCSTRL A Working Example,120+ Institutions in US, Europe, and Asia,A Globally Distributed Digital Library,NCSTRL Participants: collections federated,120+ institutions Universities/labs - research reports European Research Consortium for Informatics
9、 and Mathematics (ERCIM) Los Alamos (Physics pre-prints, ACM ) D-Lib Magazine 40+ independent servers,Federation of Collections,Documents in Distributed Repositories,Multi-Format Document Model,modular system based on a standard open architecture study of hard, real-world problems: policy issues, qu
10、ality of service, federation of publishers creation of a self-sustaining international federated digital collection,NCSTRL Real-world testbed for .,Dienst NCSTRL technical base,Implements a service-based architecture for distributed digital libraries Protocol and reference implementation Network of
11、services WWW browser access Uniform search over distributed indexes Access to documents in distributed repositories Access to multi-formatted documents,Dienst: Service-Based Architecture,Document model Naming service (CNRIs Handle System) Repository service Indexer service Collection service User In
12、terface service,Dienst Document Model,Dienst: Document Protocol,Documents addressable through their URNs Document service requests get document metadata get document formats get document in format get document partition (page) in format,Dienst 5.0 : Document Protocol,More complex document model: ver
13、sions hierarchical part specification binders (multi-part documents) “Structure” service request Reveal, in XML, full or collapsed structure of a document e.g., chapters, sections, figures, etc. Describe multiple views of a document e.g., bibliography, content, thumbnails,Dienst: Core Services,WWW b
14、rowser,Dienst User Interface,Dienst Protocol Building Gateways to non-Conforming Sites,Dienst: Collection Service,Naming Service,Documents identified by globally unique names Names are persistent, permanent Registered names resolve to specific location (URL),cnri.dlib/april97-payette,http:/www.somew
15、ebserver.org/somedirectory/somefile,Naming Authority,Item Name,Persistent Identifier (e.g., URN),Location (URL),Identifiers: Current Initiatives,IETF Uniform Resource Names (URN) specification of URN framework requirements for resolution systems syntax definition Existing Systems CNRIs Handle System
16、 (*NCSTRL uses) OCLC PURLs DOI Initiative,Looking Ahead: Current Research at Cornell,Digital Objects and Repository FEDORA Joint work in Interoperability with CNRI Access Management Resource Discovery STARTS (Cornell/Stanford collaboration) Intelligent Distributed Searching Collection Definition,Dig
17、ital Object is.,recognizable by what it can do,getChapter getPage,getTrack getLabel,getSection getArticle,getFrame getLength,Structure,Mechanism,Content-Type Interfaces,Book,MARC,What the client sees vs. What the object is,FEDORA DigitalObject,FEDORA: Extensibility for Content Types,Simple, familiar
18、 content types,Complex, compound, dynamic content types,Resource Discovery,Meta-Searching for Resource Discovery query multiple document sources choose best sources to evaluate a query evaluate the query at these sources merge the query results from these sources Stanford Protocol Proposal for Inter
19、net Retrieval and Search (STARTS) www-db.stanford.edu/gravano/starts.html www.cs.cornell.edu/NCSTRL/STARTS/STARTShome.htm,Distributed Collection Service Definition and Access,Central Collection Server,User Interface,Intelligent routing based on regional conditions,Conclusions: Design with an Eye Tow
20、ard the Future,Know limitations of ad-hoc web development and commercial packages Embrace a service-based approach modular designs increase flexibility, extensibility, plug-in/plug-out well-defined services with protocols to enable federation and interoperability can utilize various technologies or
21、commercial software underneath the service layers Watch Web developments in XML and RDF,Further reading,Lagoze and Payette: An Infrastructure for Open-Architecture Digital Libraries http:/ncstrl.cs.cornell.edu/Dienst/UI/1.0/Display/ncstrl.cornell/TR98-1690 Davis and Lagoze: NCSTRL: Design and Deploy
22、ment of a Globally Distributed Digital Library, Draft of submission to IEEE Computer Special Issue on Digital Libraries, February 1999. http:/www2.cs.cornell.edu/lagoze/papers/NCSTRL-IEEE3.doc Payette: Persistent Identifiers, RLG DigiNews http:/www.rlg.org/preserv/diginews/diginews22.html Payette and Lagoze: Flexible and Extensible Digital Object and Repository Architecture (FEDORA) http:/www2.cs.cornell.edu/NCSTRL/CDLRG/FEDORA.html,