1、1,Approaches to the Integration of Distributed and Heterogeneous Data Resources,Ahmet Sayar Indiana University Computer Science Department,2,Motivation,Integrating data from multiple data sourcesDistributed query and transactions of data.Definitions and adoptions of data, metadata and their storages
2、.Accessing the data seamlessly.Transparency, support for heterogeneity, extensibility and scalability.,3,Outline,Data Integration Approaches Application Specific Solutions Application-Integration Framework ASIS (Application Specific Information System) Database Federation Ogsa-DAI (Ogsa-Data Access
3、and Integration) Compare ASIS with Ogsa-DAI Digital Libraries SRB (Storage Resource Broker) Sompels Digital Library Approach Compare ASIS with SRB and Sompels DL,4,Application Specific Solutions,The most common means of data integration Expensive -in terms of time and skills Developing and using req
4、uires deep system knowledge Better results for special-purpose applications Fragile Changes to the underlying sources may easily break the application Hard to extend A new data source requires new code to be written,5,Outline,Data Integration Approaches Application Specific Solutions Application-Int
5、egration Framework ASIS Database Federation Ogsa-DAI Compare ASIS with Ogsa-DAI Digital Libraries SRB Sompels DL Compare ASIS with SRB and Sompels DL,6,Application-Integration Framework,It can also be called component-based framework Such as CORBA or Filters with common interfaces Not necessarily ad
6、dress data integration issues Based on common data model (such as CML and GML) With adaptors, if the source change the adaptor may have to change, but application may never see it. Adding a new source is easy a new adaptor may need to be written. The adaptor may already be exist online. No need to d
7、etailed system knowledge Ex. ASIS - OGC GIS Application Integration Framework,7,ASIS (1),Enables inter-service communication through well-defined service interfaces, message formats and capabilities metadata. Data model is ASL (Application Specific Lang.) Metadata model is capability document Data a
8、nd metadata have common predefined schema Components are Filter Services Web Services, comon service interfaces defined in WSDL Information/data services enabling distributed access, querying and transformation through their predictable input/output interfaces. Chainable, located, and capable of upd
9、ating their metadata manually or dynamically,8,ASIS (2),Data and data storage model Any data can be integrated into the system after transforming to ASL. Heterogeneity is handled at the end-Filters with adaptors. ASL is community-accepted application specific language GML (Geographic Markup Lang.) i
10、n GIS applications CML (Chemistry Markup Lang.) in Chemistry applications Filters common service interfaces getCapabilities, getData, getFeatureInfo. Requests to Filters interfaces getCapabilitiesReq, getDataReq, getFeatureInfoReq Expected return types are defined in Filters capability metadata,9,AS
11、IS (3),Metadata and Metadata storage model: Data integration is done through Filters capability metadata Metadata is stored in local Filters file system as a flat file. Capability: Inspired from OGC WMS capability specification. Look like Dublin Core format. Capability like structure is also used in
12、 Gannons approach (XPOLA), for Grid services security issues. Describes dynamic Web/Grid resources. Updated manually or dynamically. Consists of descriptor, service and provider metadata Inter-service communication is achieved without a third-party. Enables chain of Filters.,10,ASIS (4) Data Access
13、and Filter Chaining,F1,F3,F2,F4,Fault,State Boundary,Earth,Each Filter is capable of acting as both a server and a client Capability integration is done through “getCapability” service interface Requests for common service interfaces are created in accordance with predefined XML schema,Fault,11,Outl
14、ine,Data Integration Approaches Application Specific Solutions Application-Integration Framework ASIS Database Federation Ogsa-DAI Compare ASIS with Ogsa-DAI Digital Libraries SRB Sompels DL Compare ASIS with SRB and Sompels DL,12,Database Federation,Middleware consisting of database management syst
15、em Uniform access to number of heterogeneous data sources Provides query language used to combine, contrast, analyze and manipulate the data Data integration is done through Database integration. Combine data from multiple sources in a single SQL statement query recreation. Ex. Ogsa-DAI (Open Grid S
16、ervice Architecture Data Access and Integration),13,Ogsa-DAI (1),Provides common Java API for accessing and integrating data resources such relational and XML databases, and files- in Grid environment Specifically designed for OGSA architecture SQL queries on relational resources and XPath statement
17、s on XML collections Provides data pipelining (similar to Filter chaining) via an XML document called “perform” document. Allows developers to easily add or extend functionality within Ogsa-DAI, “activity” document.,14,Ogsa-DAI (2),Data and storage model : Any data stored in XML or relational databa
18、ses, files No common data model Data is provided through GDS (Grid Data Services) Uses Ogsa-DQP (Distributed Query Processor) to coordinate to access to multiple data services The enactment engine is the core of Ogsa-DAI. Orchestrate running of the perform document Information in perform document in
19、cludes: The list of activities and their XML schemas and implementation classes. The list of role mappers and details The info about data resource,15,Ogsa-DAI (3),Metadata storage model: Metadata is kept in Catalog Service (MCS) MCS enables attribute-based querying Metadata is for the datasets, data
20、 can be anything (binary, text ) Data integration is done through XML based activity file mixing activities (in SQL queries) and metadata Simple data access scenario A client contacts a DAISGR first to locate the GDSFs. Accesses suitable GDSFs directly to find out more about their properties and the
21、 data resources they represent. Asks GDSF to instantiate a GDS Accesses resource by sending the GDS the GDS-Perform doc.,16,Ogsa-DAI (4),Metadata model: No common schema for metadata like capability Defines Metadata for the datasets No schema in XML Stored in Database tables as attributes Defines Me
22、tadata for the Database system to enable querying and defining activities Schema in XML (mcsActivity.xsd schema file) Kept as XML file in the file system (mcsActivity.xml),17,ASIS vs. Ogsa-DAI,Ogsa-DAI does not define metadata and data in XML schema. Metadata is mixed with Database schema. ASIS has
23、predefined data and metadata models. Ogsa-DAI uses any data, and they have predefined Database schema to enable querying and accessing data. ASISs data integration is on demand and based on capability federation. Instead, Ogsa-DAIs data integration is coded in XML struc perform and activity document
24、s. Ogsa-DAI has central (MCS), ASIS has distributed metadata approach. Both system are based on Web Services. Ogsa-DAI uses GridFTP, and ASIS uses NaradaBrokering for the performance issues in data transfers.,18,Outline,Data Integration Approaches Application Specific Solutions Application-Integrati
25、on Framework ASIS Database Federation Ogsa-DAI Compare ASIS with Ogsa-DAI Digital Libraries SRB Sompels DL Compare ASIS with SRB and Sompels DL,19,Digital Libraries,Main focus is publishing and discovering of the digital objects. Digital Objects : file, URL, SQL command string and any string of bits
26、. Collects data from multiple different data sources. It is little bit different from the other data integration approaches Data curation services such as publishing and removing data from the data sources. Ex. SRB (Storage Resource Broker) and Sompels Digital Library Approach,20,SRB (1),A federated
27、 client server system Each server managing/brokering a set of resources An implementation architecture for Data grids Digital Libraries. Storage resources include digital libraries, MSS, UniTree and file systems SRB consists of three components MCAT services, SRB servers to access to storage reposit
28、ories and SRB clients Mediates access to distributed heterogeneous resources Uses MCAT (Metadata Catalog Service) to facilitate brokering and attribute based querying. Integrates data and metadata,21,SRB (2),Data and storage model: Uniform storage interface Resource-specific drivers to map from defi
29、ned storage to interface Storage resources are registered within SRB as physical resources Logical resources (LSR) enable replication. LSR = one or more than one physical resource Client API refers to LSR. Collections are created by LSRMetadata storage model (MCAT): Serves both a core-metadata and d
30、omain-dependent metadata Core-metadata is a standardized schema like Dublin Core Stores metadata about data, collections, users, resources, methods Attribute based access and querying, updating metadata catalog Implemented as a relational database. Oracle, DB2 or Sybase Abstraction and Replica infor
31、mation for data “Global user” name space and authentication Authorization through ACL and tickets,22,SRB (3),Metadata and Metadata Exchange Model: MAPS (Metadata Attribute Presentation Structure) Independent of the internal representation of the attributes inside the catalog. Provides a uniform inte
32、rface specification that can be used between user applications and the MCAT catalog and vice verse. Structures which form the MAPS: MAPS_Query_Struct, MAPS_Result_Struct, MAPS_Update_Struct and MAPS_Definition_Struct Mapping from MAPS to other models and exchange format. Dublin Core format is under
33、implementation.,23,SRB (4),Simple data access scenario: SRB server spawns SRB agent to authenticate the user/Application by comparing it with information stored in MCAT. Find the location in MCAT. Check user request against permissions stored in MCAT. SRB agent contacts user with the result of his r
34、equest. SRB agent communicates with the user through a port specific to this client session. SRB server chaining scenario (integrated SRBs): First 3 steps from simple data access case. SRB agent contacts remote SRB agent via remote SRB server. The second SRB agent returns the pointer to the data ite
35、m to the first SRB agent which passes it on to the user. The SRB client interact with the data item directly. The federated SRB scheme -SRB server acts as a client to another.,24,ASIS vs. SRB,SRB doesnt define metadata in XML structure (as ASIS does) SRB uses any data but ASIS uses ASL SRB keeps the
36、 metadata in Catalogue Services (MCAT). ASIS uses XML structured capability metadata SRB has central metadata handling approach, ASIS has distributed metadata handling approach ASISs data integration is based on metadata federation, SRBs data integration is based on SRB server federation. Instead of
37、 Filters, SRB uses SRB server and agents for accessing data resources.,25,Sompels DL (1),Scholarly communication as a network-based workflow Instead of Filters and ASL in ASIS, Sompel defines “repositories” and “digital objects”, respectively. Repository is a networked system that provides services
38、pertaining to a collection of Digital Objects Repositories have common service interfaces. “Obtain”, “Harvest” and “Put”. Two classes of participants. Data providers (DP) and Service providers (SP) SP collect metadata from DPs (via 3 service interface); normalize and cluster it to deal with duplicat
39、es. DP offer some type of search mechanism for their own repositories.,26,Sompels DL (2),Data and storage model: Data is the abstraction of the Digital Objects Digital Objects = Digital data + key metadata. Serialization of Digital Objects = Surrogates Surrogates Information for the value chains and
40、 service information used at repository service interfaces. In the XML/RDF format Composed of “dataStream” and/or “Entity” tag elements. Chained object is defined by keymetadataID or “providerInfo”. Different storage types: book repositories, teaching object repositories, dataset repositories etc. R
41、epositories are active nodes. Repositories enable the use and re-use of materials in many contexts.,27,Sompels DL (3),Metadata model: Surrogates are essentially metadata records for objects Based on Dublin Core format with domain specific extensions. Dublin core has 15 standard entities to define re
42、sources. For more details see http:/doublincore.org Chaining for integrating data: Application/User doesnt need to use workflow engine or script to create or run the chain. (As in ASIS) Chain (they call “value chain”) is hidden in the surrogates. Surrogates are updated through the common interfaces
43、(“put” “obtain” and “harvest”) of the resources. Chain is defined in the “Entity” element in the surrogate document with the “Lineage” sub element. Sample chaining scenario: A paper might have references to some papers and these papers might be references to some other papers. Value chain does not s
44、top. Papers have different metadata (value added) through value chain,28,ASIS vs. Sompels Approach,Instead of Filters and ASL in ASIS, Sompel defines “repositories” and “digital objects” respectively DP correspond to End-Filters, and SP correspond to Filters in ASIS ASIS do not have publishing or pu
45、tting service interfaces “Obtain” corresponds to “getData” in ASIS “Harvest” corresponds to “getCapabilities” in ASIS Both have distributed metadata approaches for data integration ASIS direct communication between Filters by using “GetCapabilities” interface Sompes DL direct communication between r
46、epositories and services by using “Harvest” interface Sompels DL uses Dublin Core for the representation of the resources ASIS uses its own schema.ASIS uses ASL for the representation of the data - Sompels approach doesnt have common data model.,29,Summary,Application-Integration Framework (ASIS) Ea
47、sy to add new sources Using online Filters providing required adaptors peer-to-peer chain of Filters no central metadata catalog server Distributed capability exchange and aggregation SOA Re-usable components (Filters) for different applications in predefined domain Implications of Filter services S
48、calable and Fault-tolerant Load-balancing and caching Dynamically updating capability metadata,30,THANKS !,31,APPENDIX,32,Capability in Grid Services Security,XPOLA The infrastructure is built on a peer-to-peer chain-of-trust model. No central admins WS-Security compliant Extensible PKI and SAML bas
49、ed Dynamic and reusable (manually or automatically generated) Composed of two sectors. Policy document (SAML, lifetime info, binding info etc.) Providers signature Existing grid security solutions to fine-grained authorization were not addressing general Web/Grid services in compliant with Web Services security specs. With central admins, other approaches dont address dynamic services,33,Sample Capabilities File (too simplified) GIS Domain, CGL_Mapping CGL_Mapping WMS WMS_XML image/GIF image/PNG California:Faults California:Faults EPSG:4326 ,