1、The XML Standard,Overview of our XML Standards,Motivation: HTML vs XML XML 101: syntax, elements, attributes, DTDs, XML 201: XML Schema, Namespaces XSLT: Transforming and Rendering XML XQuery: Search, Transform & Integrate,So what is XML (all about)?,Executive Summary: XML = HTML idiosyncrasies (sim
2、plified syntax) + user-definable (“semantic“) tags Separation of data and its presentation= simple, very flexible data exchange format:semistructured data model = new applications: Information exchange (B2B), sharing (diglib), integration (“mediation“), archival, . Web site mangement (XML+XSL styles
3、heets), .,Whats Wrong with HTML?,Y.Papakonstantinou, S. Abiteboul, H. Garcia-Molina. “ObjectFusion in Mediator Systems“. In VLDB 96. ,Y. Papakonstantinou, S. Abiteboul, H. Garcia-Molina. “Object Fusion in Mediator Systems”. In VLDB 96.,HTML confuses presentation with content,.Whats Wrong with HTML.,
4、Y.Papakonstantinou, S. Abiteboul, H. Garcia-Molina. “ObjectFusion in Mediator Systems“. In VLDB 96. ,No Explicit Structure,Semantics, or Object-Orientation,Author,Conference,Title,. And Some Repercussions,Lack of schema/semantics when querying the Web (HTML): “find documents (books, papers, .) where
5、 author = Michael Jackson“ (. and learn how software engineering meets the moon walker .) “create a list of M. Jacksons books and (if available) their prices“= HTML is inappropriate for data exchange automation of information management (retrieval, manipulation, integration),XML is Based on Markup,Y
6、.PapakonstantinouS. AbiteboulH. Garcia-MolinaObject Fusion in Mediator Systems VLDB 96,Markup indicates structure and semantics,Decoupled from presentation,Elements and their Content,element,element name,Character content,Element Content,Empty Element,Y.PapakonstantinouS. AbiteboulH. Garcia-MolinaOb
7、ject Fusion in Mediator Systems VLDB 96,Element Attributes,Y.PapakonstantinouS. AbiteboulH. Garcia-MolinaObject Fusion in Mediator Systems VLDB 96,Attribute name,Attribute Value,XML = Labeled Ordered Trees,YannisSerge.Object Fusion., semistructured data labeled trees/graphs,can also representrelatio
8、nal and object-oriented data,id,23,How do I share structure and metadata/semanticswith my community?,In Search of the Lost Structure & Semantics,How to make all this automatable?,How do I learn and usethe element structure of a document?,Adding Structure and Semantics,XML Document Type Definitions (
9、DTDs): define the structure of “allowed“ documents (i.e., valid wrt. a DTD) database schema = improve query formulation, execution, . XML Schema defines structure and data types XML Namespaces identify your vocabulary Resource Description Framework (RDF) simple metadata model,XML DTDs as Extended CF
10、Gs,lhs = element (name) rhs = regular expression over elements + strings (PCDATA),XML DTD,Grammar,Document Type Definitions (DTDs),Define and Constrain Element Names & Structure,Element Type Declaration,Attribute List Declaration,Element Declarations,Character content,Authors followed by optional fu
11、llpaper, followed by title, followed by booktitle,Sequence of 1 or more author,Sequence of 0 or more paper,Element Content Declarations,Attributes,Y.PapakonstantinouObject Fusion in Mediator Systems,Object Identity Attribute,CDATA (character data), Yannis info ,IDREF intradocument reference,Referenc
12、e to external ENTITY,Attribute Types,More on Attribute Declarations,Attributes may be REQUIRED IMPLIED (optional) can have default values default value may be FIXED,Uses of XML Entities,Physical partition size, reuse, “modularity“, (both XML docs & DTDs) Non-XML data unparsed entities binary data No
13、n-standard characters character entities Shorthand for phrases & markup,Types of Entities,Internal (to a doc) vs. External ( use URI) General (in XML doc) vs. Parameter (in DTD) Parsed (XML) vs. Unparsed (non-XML),Internal Text Entities,We all use the .,Internal Text Entity Declaration,Entity Refere
14、nce,We all use the World Wide Web.,Logically equivalent to actually appearing,Unparsed (& “Binary“) Entities,. and unparsed entity,Element with ENTITY attribute,Declare attribute type to be entity,NOTATION declaration (helper app),Declare external.,From Docs to Data: XML Schema,XML DTDs (part of the
15、 XML spec.) flexible, semistructured data model (nesting, ANY, ?, *, |, .) but document-oriented (SGML heritage) XML Schema (W3C working draft) schema definition language in XML data-oriented: data types extends capabilities of DTD,Sample Data for Introduction to XML Schema, Being a Dog Is a Full-Ti
16、me JobCharles M. SchulzSnoopyPeppermint Patty1950-10-04extroverted beagle Peppermint Patty1966-08-22bold, brash and tomboyish,The Simple “Russian Doll” Approach to XML Schema,Optional Namespace Definition,Sequence Compositor,Simple Type Content for title and author,Complex Type Content for book,Char
17、acter may appear any number of times,Basic Type of XML Schema,The Catalog Approach to XML Schema: Stand-Alone Declarations & References,Simple Type Elements,Attributes,Complex Type Element character,Reference,Catalog Approach Contd,Named Types,Write stand-alone named complex type or simple type decl
18、arations Primitive form of inheritance (called derivation) allows Restriction Extension,nameType derived from xsd:string by having the xsd:maxLength facet restrict string to a Maximum of to 32 characters,nameType used in the declaration of characterType,Groups: Named containers of sets of Elements o
19、r Attributes,Compositors: Sequence, Choice, All,So far we have seen sequences,The group nameTypes consists of one of the element “name” the sequence containing firstName,middlename, lastName,Compositors (contd),The characterType consists of name, a list of friend-of, since, and qualification particl
20、es in no particular order. (Compare with the sequence compositor.),Derivation of Simple Types: Unions and Lists,So far we have seen restrictions and facets,The simple type isbnType will be either a 10-digit string (notice the pattern) the token “TBD“ or the token “NA“,Constraints: Uniqueness, ,By in
21、serting xsd:unique in the book element declaration we enforce that the character names in each book are unique,Namespaces,Including Unknown Elements,Presenting XML: XSLT,Why Stylesheets? separation of content (XML) from presentation (XSL) Why not just CSS for XML? XSL is far more powerful: selecting
22、 elements transforming the XML tree content based display (result may depend on data),XSLT Overview,XSLT stylesheets are denoted in XML syntax XSL components: 1. a language for transforming XML documents (XSLT: integral part of the XSL specification) 2. an XML formatting vocabulary (Formatting Objec
23、ts: 90% of the formatting properties inherited from CSS),XSLT Processing Model,XSLT Processing Model,XSL stylesheet: collection of template rules template rule: (pattern template) main steps: match pattern against source tree instantiate template (replace current node “.” by the template in the resu
24、lt tree) select further nodes for processing control can be program-driven (“pull“: .) data/event-driven (“push“: .),Template Rule: Example,(i) match pattern: process elements (ii) instantiate template: replace each a product with two HTML tables (iii) select the grandchildren (“sales/domestic”, “sa
25、les/foreign”) for further processing,pattern,template,Match/Select Patterns,match patterns select patterns = defined in http:/w3.org/TR/xpath Examples: /mybook/chapter2/section/* chapter|appendix chapter/para divclass=“appendix“ and position() mod 2 = 1/para /lang,Creating the Result Tree.,Literal r
26、esult elements: non-XSL elements (e.g., HTML) appear “literally” in the result tree Constructing elements:(similar for xsl:attribute, xsl:text, xsl:comment,) Generating text:,attribute & children definition ,Example of Turning XML into HTML,Jeff555-1234555-4321lightgrey,HTML Document in an XSL Templ
27、ate,WelcomeWelcome!,Extracting the Member Name,WelcomeWelcome !,Extracting a Value from an XML Document, Navigating the XML Document,Extracting values: use the XSL element Navigating: The slash (“/“) indicates parent/child relationship A slash at the beginning of the path indicates that it is an abs
28、olute path, starting from the top of the XML document,/FitnessCenter/Member/Name,“Start from the top of the XML document, go to the FitnessCenter element, from there go to the Member element, and from there go to the Name element.“,Document /,PI ,Element FitnessCenter,Element Member,Element Name,Ele
29、ment Phone,Element Phone,Element FavoriteColor,Text Jeff,Text 555-1234,Text 555-4321,Text lightgrey,Extract the FavoriteColor and use it as the bgcolor,WelcomeWelcome !,(see html-example03),Note,Attribute values cannot contain “- Consequently, the following is NOT valid:“,To extract the value of an
30、XML element and use it as an attribute value you must use curly braces:,Evaluate the expression within the curly braces. Assign the value to the attribute.,Extract the Home Phone Number,WelcomeWelcome !Your home phone number is:,Creating the Result Tree.,Further XSL elements for . Numbering Conditio
31、nsRepetition.,Creating the Result Tree: Repetition,customers.,Creating the Result Tree: Sorting,More on XSL,XSL(T): Conflict resolution for multiple applicable rules Modularization XSL Formatting Objects a la CSS XPath (navigation syntax + functions)= XSLT XPointer .,XQuery: Querying XML Sources,Fun
32、ctional Query Language Operates on the Xpath/XQuery data model List of ordered trees A document is list of size 1 XQuery expressions are composed of Path expressions Element constructors FLWR expressions and more ,chapter,Path Expressions,doc(“zoo.xml”)/chapter2/figurecaption=“Tree Frogs”,In the sec
33、ond chapter of the document zoo.xml find the figures with caption “Tree Frogs”,book,chapter,chapter,appendix,part,section,paragraph,figure,caption,“Tree Frogs”,chapter,chapter,paragraph,figure,caption,“Just Frogs”,part,More Path Expressions,Find the first immediate chapter subelements of immediate p
34、art subelements of the document zoo.xml and retrieve figures that have ,doc(“zoo.xml”)/part/chapter1/figurecaption=“Tree Frogs”,chapter,book,chapter,chapter,appendix,part,section,paragraph,figure,caption,“Tree Frogs”,chapter,chapter,paragraph,figure,caption,“Just Frogs”,part,Element Construction,doc
35、(“zoo.xml”)/chapter2/figurecaption=“Tree Frogs” ,In the second chapter of the document zoo.xml find the figures with caption “Tree Frogs” and place them into an element called result,figure,caption,“Tree Frogs”,result,Bibliography Example Data Set,Aho Hopcroft Ullman Automata Theory Morgan Kaufmann
36、1998 /yearUllman Database Systems Morgan Kaufmann 1998 /yearAbiteboul Buneman Suciu Automata Theory Prentice Hall 1998 /year,Reviews Example Data Set,Automata Theory Its the best in automata theory A definitive textbook ,For-Let-Where-Return (FLWR),FOR $b in doc(“bib.xml”)/book WHERE $b/publisher =
37、“Morgan Kaufmann” RETURN $b/title,List the titles of books published by “Morgan Kaufmann”,year,bib,book,book,book,publisher,Morgan Kaufmann,year,publisher,Morgan Kaufmann,1998,1998,book,year,publisher,Prentice Hall,1998,title,title,title,Think (tuples of) variable bindings,FOR/LET,WHERE,RETURN,Order
38、ed lists of tuples of variable bindings,Tuples of that satisfy the conditions,List of trees,$b book book book,$b book book,title,title,year,year,bib,book,book,publisher,Morgan Kaufmann,year,publisher,Morgan Kaufmann,1998,1998,book,year,publisher,Prentice Hall,1998,title,title,title,FOR $b in doc(“bi
39、b.xml”)/bookWHERE $b/year 1990RETURN $b/author,Return the list of authors who published after 1990,Tuples,FOR $p in distinct(doc(“bib.xml”)/publisher) LET $b := document(“bib.xml”)/bookpublisher = $p WHERE count($b) 1 RETURN $p,List publishers who have published more than 1 book,Tuples ($p, $b) are
40、formulated,Boolean Expressions in WHERE,FOR $b in doc(“bib.xml”)/book WHERE $b/publisher = “Morgan Kaufmann”AND $b/year = “1998” RETURN $b/title,List the titles of books published by “Morgan Kaufmann” in 1998,Joins,FOR $b in doc(“bib.xml”)/book,$r in doc(“review.xml”)/review WHERE $b/title = $r/titl
41、e RETURN $b/*$b/*$r/comment,For every book with a matching review outputa book_with_review that contains all the attributesand subelements of bookand the comment subelements of review,Aho Hopcroft Ullman Automata Theory Morgan Kaufmann 1998 /yearIts the best in automata theory A definitive textbook
42、,Relax Order Conditions,FOR $b in unordered(doc(“bib.xml”)/book) WHERE $b/publisher = “Morgan Kaufmann”AND $b/year = “1998” RETURN $b/title,List the titles of books published by “Morgan Kaufmann” in 1998,Very important feature in dealing with relational sources and other set-oriented sources.,SELECT
43、 title FROM bib WHERE publisher = “Morgan Kaufmann” AND year =1998,Depending on the indices and access methods used, the SQL query processor may deliver the tuples in different order,Nested queries,FOR $a IN distinct(document(“bib.xml”)/author/text() RETURN$a FOR $b IN document(“bib.xml”)/bookauthor
44、=$aRETURN $b/title,Invert the structure of the input document so that there is a list of author elements containing the name of the author and the list of books he wrote,Conditionals,FOR $b IN doc(“bib.xml”)/book RETURN$b/titleIF count($b/author) and others,Existential and Universal Quantification,F
45、OR $b in doc(“bib.xml”)/book WHERE $b/author = “Ullman” RETURN $b,FOR $b in doc(“bib.xml”)/book WHERE EVERY $author IN $b/authorSATISFIES $author= “Ullman” RETURN $b,Return books where at least one of the authors is “Ullman”,Return books where all authors are “Ullman”,Functions,DEFINE FUNCTION depth
46、($e) RETURNS xsd:integer IF (empty($e/*) THEN 1ELSE max(depth($e/*) + 1 ,FOR $b in doc(“bib.xml”)/book RETURN depth($b),Applicability of XML Query Languages (Xquery),XQuery standard does NOT elaborate on the physical aspects of the XML sources Custom functions can provide access and reference to the
47、 source(s) document(“test.xml”), source(“view1”) Question: as we go down the list of uses of XQuery compare with XSL,XQuery on files, DOM objects, event streams, messages,Usage scenarios Transformation and processing of messages Significant (but not “killer”) advantages over XSL Minor performance op
48、timization superiority Better streaming, pipelining Cleaner extensible language Many academic and industrial prototypes of XQuery on files,XML File,XQuery Processor,XQuery,DOM Object,SAX Stream,Typical Scenario: XML Messaging,Wrapper,RDBMS,Wrapper,SAP ERP,Application,SOAP service,Message Transformer,Summary of Steps,Developers Program Issues SQL Query,Wrapper returns SQL result wrapped as XML message,Developers XQuery transforms XML message to XML format needed by app,Typical Scenario: XML Messaging,Wrapper,RDBMS,Wrapper,SAP ERP,Application,SOAP service,