1、ANHAI DOAN ALON HALEVY ZACHARY IVES,Chapter 12: Ontologies and Knowledge Representation,PRINCIPLES OF DATA INTEGRATION,Outline,Introduction to Knowledge Representation and its relevance to data integration Description Logics: a family of KR languages The Semantic Web and its languages,Knowledge Repr
2、esentation,Knowledge representation (KR) focuses on more expressive languages that database schemata and integrity constraints: Designed for artificial intelligence applications (e.g., natural language understanding, planning) where complex relationships exist between objects. KR uses ontologies to
3、represent relationships between elements in a knowledge base. KR is relevant to data integration because relationships between data sources can be complex. The use of KR in data integration was investigated since the early days of the field.,KR in Data Integration: Example,Mediated schema: ontology
4、with classes and relationships,Data sources: S1 has comedies and S2 documentaries,S3: movies with at least two awards S4: comedies with at least one Oscar,Example: Part 1,S1 is relevant to Q1 because Comedy is a subclass of Movie (by subsumption),Example: Part 2,S2 is irrelevant to Q2 because Comedy
5、 and Documentary are declared disjoint.,Example: Part 3(a),S3 is relevant to Q3 because movies with two awards will definitely satisfy the second subgoal.,Example: Part 3(b),S4 is relevant to Q3 because oscar is a sub-property of award.,Outline,Introduction to Knowledge Representation and its releva
6、nce to data integration Description Logics: a family of KR languages The Semantic Web and its languages,Description Logics: Introduction,Description Logics are a subset of first-order logic: Only unary predicates (called concepts) and binary predicates (called roles, properties). Knowledge bases are
7、 composed of: T-box: defining the concepts and the roles A-box: including ground facts about individuals Complex concepts are defined by concept descriptions: The expressive power of the language is determined by the set of constructors in the grammar of concept descriptions Complex roles can also b
8、e defined via constructors,T-Boxes,Can include statements of the form:A is a base concept and C can be a concept description. Example grammar for concept descriptions see next slide.,(it should really be a square inclusion),An example Grammar for Concept Descriptions.,C,D are complex concepts. A is
9、a base concept.,Many other constructors possible: union, existential quantification, equality on role paths,Example Terminology,a1: Italians are people (really. Dont laugh!) a2: Comedies are movies a3: Comedies are disjoint from documentaries a4: Movies have at most one director a5: Award movies are
10、 those that have at least one award a6: Italian hits are award movies whose director is Italian,Abox: the Ground Facts,A set of assertions of the form C(a), or R(a,b) b is called an R-filler of a. C and R can be concept descriptions Akin to asserting that a tuple is in a view rather than in base rel
11、ations Below, we state that LaStrada is an Italian hit, were not given the director or the award it won.,Semantics of Description Logics,Semantics are based on interpretations. Given a knowledge base , the models of are the interpretations that are consistent s T-box and A-box. Any fact that is true
12、 in all models of are said to be entailed by .,Interpretations: Formally,An interpretation I contains a non-empty domain of objects, OI . I assigns an object aI in to every constant a in the A-box. We make the unique names assumption: ab implies that aIbI I assigns CI , a subset of OI, to every conc
13、ept C I assigns a binary relation RI, a subset of OI x OI to every role R.,Extensions of Complex Expressions,The extensions of concept and role descriptions are given by the following equations. (#S denotes the cardinality of the set S).,Conditions on Models,An interpretation of is a model if the fo
14、llowing conditions hold:,Example Interpretation,Assume an interpretation with the identity mapping on individuals in the knowledge base and a few extra elements (Director1, Award1, Actor2, ). The following interpretation is a model:,Example Interpretation,Notes: We do not know the director of LaStra
15、da or its award. Removing LifeIsBeautiful from Comedy would make it a non-model. Adding another director would also make it a non-model.,Inference in Description Logics,This is where all the action is: coming up with efficient algorithms for inference and understanding the complexity of the problems
16、. Subsumption (only for the T-box): A concept C is said to be subsumed by concept D w.r.t. a T-box T, if in every model, I, of T, Examples:,is subsumed by,is subsumed by,Query answering with DLs,The simple case instance checking: Does entail C(a) or R(a,b)? i.e., does C(a)/R(a,b) hold in every model
17、 of ? The more general problem is query answering. Find the answers to a conjunctive querywhere g1, gn can be concept and/or role names.,Semantics of Conjunctive Queries,Compute the answer to Q in every model of Any tuple that is in the intersection of the answers is entailed by . This should remind
18、 you of the semantics of certain answers! Lets look at a few examples.,Query Answering: Example 1,Consider the Q1 over the following A-boxApplying Q1 directly to the A-box would yield no answers (award would not be matched) However, ItalianHits(LifeIsBeautiful) implies that LifeIsBeautiful won at le
19、ast one award. Hence, LifeIsBeautiful should be in the answer!,Query Answering: Example 2,Consider Q2:With the following A-box Comedy(LaFunivia), director(LaFunivia,Antonioni), Italian(Antonioni) Neither conjunctive query will yield an answer because we know nothing about awards. However, we can rea
20、son by cases that the following is entailed by Q2.,End of Example 2,Ok, we haveBut thats not enough to infer that LaFunivia should be in the answer. However, we also know that movies have at most one director, so:Hence, LaFunivia is an answer to Q2.,Comparing DLs to OODB,Object-oriented databases: A
21、lso focus on unary and binary relations OODBs are more focused on modeling the physical aspects of objects and their properties An object can only belong to a single (most specific) class. Description logics are about knowledge and complex relationships: Class membership can be inferred An individua
22、l can belong to multiple classes.,Comparing DLs to Relational Views,In principle, concept descriptions are view definitions Relational views employ: selection, projection, join, union and apply to more than unary and binary relations. DLs: universal quantification, number restrictions, intersection,
23、 Subsumption = query containment Universal quantification and number restrictions would require negation in conjunctive queries. Hence containment would be undecidable In DLs you can put facts directly in views (i.e., complex concept).,Outline,Introduction to Knowledge Representation and its relevan
24、ce to data integration Description Logics: a family of KR languages The Semantic Web and its languages,The Semantic Web,Basic idea: annotate content on Web pages with semantics Specify that a web page is about a restaurant, where the address appears on the page, and what are the menu items. On a pag
25、e with restaurant reviews, mark the restaurants with a global identifier so the review and restaurant data can be fused. Without these annotations, systems need to infer this correlations and are often wrong.,Languages, Languages,RDF: Resource Description Framework Language for marking up data Tripl
26、es with a few cool features RDF Schema (RDFS): basic schema for RDF documents OWL: Web Ontology Language. Comes in multiple flavors: Owl-Lite Owl-DL, Owl-Full All these languages are influenced by KR formalisms (some more and some less),RDF Basics,RDF triples are statements about “resources” They ar
27、e of the form: (subject, predicate, value) Names can get long (because they can be URLs), so we often use qnames (qualified names) ex: instead of http:/www.example.org/,RDF as a Graph,Uniform Resource Identifiers: available beyond a single data set.,Uniform Resource Identifiers,In a typical database
28、, identifiers are used only internally. They have no meaning outside the database. RDF uses URIs for subjects, predicates and optionally for values Hence, multiple data sets can refer to the same identifier. Key benefit for data integration! Note: this does not entail standardization! Youre free to
29、invent your own, but youre encouraged to reuse existing URIs so your data meshes well with others,Blank Nodes,You can assign IDs to blank nodes, but they are internal to a document.,Blank node,Reification,Reification is a way of stating statements about statements: Provenance, uncertainty, date asse
30、rted, To reify, make the statement itself into a resourceOnce reified, you can state its properties:,RDF Schema,Enables declaring classes, subclass hierarchies, membership in a class, and restrictions on domains and ranges of classes. Important: a class can be an instance of another class!,RDFS: Dec
31、laring Properties,RDFS: you can declare sub-properties, domains and ranges of properties.,OWL: Web Ontology Language,Languages based on description logics but without the unique-names assumption sameAs and differentFrom specify whether two individuals are the same/different. OWL-Lite: intersection,
32、number restrictions (but only with 0 or 1), universal and existential quantification on properties. OWL-DL: + union, complement, disjointness, number restrictions, enumeration (Sunday, Monday), hasValue (filler for property value), and more. OWL-Full: OWL-DL + reification.,SparQL: Querying RDF,Langu
33、age based on matching of triples Borrows ideas from conjunctive queries and XQuery,Result:,SparQL: The Construct Clause,Summary of Chapter 12,Knowledge Representation enables modeling complex relationships between classes and objects. The languages of the Semantic Web apply these ideas to the Web context and with URIs. The constant challenge: the tradeoff between expressive power and computational complexity of reasoning Question to ponder: Can we live with a fast reasoning algorithm that misses some derivations occasionally?,