CSC343 -- Introduction To Databases Weeks 7 And 9: XML .

2y ago
17 Views
3 Downloads
251.31 KB
43 Pages
Last View : 13d ago
Last Download : 3m ago
Upload by : Nadine Tse
Transcription

CSC343 -- Introduction to DatabasesWeeks 7 and 9:XML Data and Query ProcessingSemistructured Data, HTMLXML and DTDsXPath, XQueryXML -- 1CSC343 -- Introduction to DatabasesHypertextÆMost human knowledge exists today in documentformat (books etc.)ÆNeed technologies that store, and retrieve suchunstructured data same way as structured data!ÆFrom text to hypertext: add annotations (tags,markups) in a document, to be used for indexing.ÆOld idea (Vannevar Bush,Atlantic Monthly, Markup languages exist since 1970 -- SGML,ÆGreat tutorial:http://www.brics.dk/ amoeller/XML/XML -- 21

CSC343 -- Introduction to DatabasesHTML: HyperText Markup LanguageÆMotivation: Exchange data on the Internet;documents are published by servers and arepresented by clients (browsers).ÆHTML was created by Tim Berners-Lee andRobert Caillau at CERN in 1991; they wanted tokeep track of experimental data.ÆHTML describes only the logical structure ofdocuments:9browsers are free to interpret markup tags asthey please;9the document even makes sense if the tagsare ignored.XML -- 3CSC343 -- Introduction to DatabasesHTML DataÆAn HTML document to be displayed on theWeb: dt Name: John Doe dd Id: s111111111 dd Address: ul li Number: 123 /li li Street: Main /li /ul /dt HTML does not dt Name: Joe Publicdistinguish dd Id: s222222222 between attributesand values. /dt XML -- 42

CSC343 -- Introduction to DatabasesWhat's Great About HTML?ÆMany document formats are bulky: author controlsprecise layout, formatting details stored withcontent.ÆIn comparison, HTML is light-weight: authorsacrifices control for compactness, only content andlogical structure is represented.ÆSizes of documents containing just the text "HelloWorld!":PostScript hello.ps11,274 bytesPDFhello.pdf 4,915 bytesMS Word hello.doc 19,456 bytesHTMLhello.html 44 bytesXML -- 5CSC343 -- Introduction to DatabasesFrom Logical to Physical StructureÆOriginally, HTML described logical structure:9h2: "this is a header at level 2";9em: "this text should be emphasized";9ul: "this is a list of items".ÆQuickly, users wanted more control:9"this header is centered and written inTimes-Roman in size 28pt","italicize text";ÆThe early hack for commercial pages was tomake everything a huge image:HTMLhello.html 44 bytesGIFhello.gif32,700 bytesÆHTML developers kept adding layout tags.XML -- 63

CSC343 -- Introduction to DatabasesCascading Style Sheets (CSS)ÆSpecify physical properties (layout) of HTML tags; are(usually) written in separate files; can be shared formany HTML documents.ÆThere are many advantages:9logical and physical properties may be separated;9document groups can have consistent looks;9the look can easily be changed.ÆA CSS stylesheet works by:9allowing 50 properties to be defined for each tag;9definitions for a tag may depend on its context;9undefined properties are inherited;9normal HTML corresponds to default properties.ÆUsing stylesheets, all tags become logical.XML -- 7CSC343 -- Introduction to DatabasesWhy XML?ÆXML is a standard for data exchange that istaking over the World.ÆAll major database products have beenretrofitted with facilities to store and constructXML documents.ÆThere are already database products that arespecifically designed to work with XMLdocuments rather than relational or objectoriented data.ÆXML is closely related to object-oriented andso-called semistructured data.XML -- 84

CSC343 -- Introduction to DatabasesSemistructured DataÆTo make the HTML student list (earlier example)suitable for machine consumption on the Web, itshould have these characteristics:9Be object-like;9Be schema-less — no guarantee it conformsexactly to any schema, but different objectsshare some commonalities;9Be self-describing — some schema-likeinformation, e.g., attribute names, is part of thedata itself.ÆData with these characteristics are referred to assemistructured.XML -- 9CSC343 -- Introduction to DatabasesWhat is Self-Describing Data?Non-self-describing first (relational, object-oriented):Data part:(#123, [“Students”,{[“John”, s111111111, [123,”Main St”]],[“Joe”, s222222222, [321, “Pine St”]] } ] )Schema part:PersonList[PersonList ListName: String,Contents: [ Name: String,Id: String,Address: [Number: Integer, Street: String] ] ]XML -- 105

CSC343 -- Introduction to DatabasesSelf-Describing DataÆ Attribute names embedded in the data itself, but aredistinguished from values.Æ Doesn’t need schema to figure out what is what (but schemamight be useful nonetheless)(#12345, [ListName: “Students”,Contents:{ [Name: “John Doe”,Id: “s111111111”,Address: [Num: 123,Str: “Main St.”] ] ,[Name: “Joe Public”,Id: “s222222222”,Address:[Num:321,Str:“Pine St.”] ]}] )XML -- 11CSC343 -- Introduction to DatabasesXML – The De Facto Standard forSemistructured DataÆXML, the eXtensible Markup Language – suitablefor semistructured data and has become astandard:9Easy to describe object-like data;9Self-describing;9Doesn’t require a schema – but can use one.ÆWe will study:9DTDs – an early technique for specifying XMLschemas;9Query and transformation languages – XPathand XQuery.XML -- 126

CSC343 -- Introduction to DatabasesOverview of XMLÆLike HTML, but any number of different tags can beused (up to the document author) – hence extensible.ÆUnlike HTML, no semantics behind the tags:9For instance, HTML’s table /table means:render content as table; in XML doesn’t meananything special;9Some semantics can be specified using XMLSchema (types); some using stylesheets (browserrendering)ÆUnlike HTML, XML is intolerant to bugs: Browsers will render buggy HTML pages; XML processors are not supposed to processbuggy XML documents.XML -- 13CSC343 -- Introduction to Databasesa brief segwayÆStylus Studio is a development tool which canbe used to create and query XML.ÆData Direct Technologies has allowed us touse it for the duration of this course.ÆTake the time to download Stylus Studio toyour PC.ÆIt is a valuable learning tool and can be usedto verify your assignments.ÆIt is fairly easy to do – the following slides willguide you through the process.XML -- 147

CSC343 -- Introduction to DatabasesDownloading Stylus Studio to your PC1. Go to the Stylus Studio Web page athttp://www.stylusstudio.com/2. From there, navigate to the download page.Click on:Download3. Download the softwareSelect: Download Now4. Install it on your PC.XML -- 15CSC343 -- Introduction to DatabasesRunning Stylus Studio for the first time1. Click on the Stylus Studio icon.2. The registration screen will appear. Fill inthe necessary information.3. You will be required to enter the followingregistration code:D6EEY-EB29Y-C3BEP-B4JEN4. You are encouraged to download it earlyand start using XML. Have fun!!XML -- 168

CSC343 -- Introduction to DatabasesConceptual View of XMLÆAn XML document is (isomorphic to) anordered, labeled tree.ÆCharacter data leaf nodes contain theactual data (text strings); usually, characterdata nodes must be non-empty and nonadjacent to other character data nodesÆElements nodes, are each labeled with9a name (often called the element type),and9a set of attributes, each consisting of aname and a value,and these nodes can have child nodesXML -- 17CSC343 -- Introduction to DatabasesDomain-SpecificMarkups h1 Rhubarb Cobbler /h1 h2 Maggie.Herrick@bbs.mhv.net /h2 h3 Wed, 14 Jun 95 /h3 Rhubarb Cobbler made with bananasas the main sweetener. It wasdelicious. Basicly it was table tr td 2 1/2 cups td dicedrhubarb tr td 2 tablespoons td sugar tr td 2 td fairly ripe bananas tr td 1/4 teaspoon td cinnamon tr td dash of td nutmeg /table Combine all and use as cobbler, pie,or crisp.Related recipes: ahref "#GardenQuiche" GardenQuiche /a XML -- 189

CSC343 -- Introduction to DatabasesAn XML Document ?xml version “1.0” ? attributeselementsRoot element PersonList Type “Student” Date “2002-02-02” Title Value “Student List” / Person /Person Person /Person /PersonList Element (ortag) namesÆ Elements are nested.Æ Root element contains all others.XML -- 19CSC343 -- Introduction to DatabasesMore TerminologyContent of PersonOpening tag Person Name “John” Id “s111111111” “standalone” text, notJohn is a nice fellowvery useful as data Address Number 21 /Number Street Main St. /Street Nested element, /Address child of Person Child of Address,Address /Person Descendant of PersonParent ofAddress,AddressAncestor ofNumberClosing tag:What is open must be closedXML -- 2010

CSC343 -- Introduction to DatabasesWell-formed XML DocumentsÆMust have a root element.ÆEvery opening tag has a matching closing tag.ÆElements must be properly nested foo bar /foo /bar is a no-noÆAn attribute name can occur at most once in anopening tag. If it occurs,9It must have an explicitly specified value(Boolean attrs are not allowed);9The value must be quoted (with “ or ‘).ÆXML processors are not supposed to try and fix illformed documents (unlike HTML browsers).XML -- 21CSC343 -- Introduction to DatabasesIdentifiers and References withAttributesAn attribute can be declared to have type:Æ ID:ID unique identifier of an element; if attr1 and attr2 areboth of type ID, then it is illegal to have somethingattr1 “abc” somethingelse attr2 “abc” within thesame documentÆ IDREF:IDREF references a unique element with matching IDattribute; if attr1 has type ID and attr2 has type IDREFthen we can have something attr1 “abc” somethingelse attr2 “abc” Æ IDREFS – a list of references, if attr1 is ID and attr2 isIDREFS, then we can have something attr1 “abc” something1 attr1 “cde” something2 attr2 “abccde” XML -- 2211

CSC343 -- Introduction to DatabasesReport Document with Cross-Refs ?xml version “1.0” ? Report Date “2002-12-12” ID Students Student StudId “s111111111” Name First John /First Last Doe /Last /Name Status U2 /Status CrsTaken CrsCode “CS308” Semester “F1997” / CrsTaken CrsCode “MAT123” Semester “F1997” / /Student Student StudId “s666666666” Name First Joe /First Last Public /Last /Name Status U3 /Status CrsTaken CrsCode “CS308” Semester “F1994” / CrsTaken CrsCode “MAT123” Semester “F1997” / /Student Student StudId “s987654321” Name First Bart /First Last Simpson /Last /Name Status U4 /Status CrsTaken CrsCode “CS308” Semester “F1994” / /Student /Students IDREF continued XML -- 23CSC343 -- Introduction to DatabasesReport Document (cont’d.) Classes Class CrsCode CS308 /CrsCode Semester F1994 /Semester ClassRoster Members “s666666666 s987654321” / /Class IDREFS Class CrsCode CS308 /CrsCode Semester F1997 /Semester ClassRoster Members “s111111111” / /Class Class CrsCode MAT123 /CrsCode Semester F1997 /Semester ClassRoster Members “s111111111 s666666666” / /Class /Classes continued XML -- 2412

CSC343 -- Introduction to DatabasesReport Document cont’dID Courses Course CrsCode “CS308” CrsName Market Analysis /CrsName /Course Course CrsCode “MAT123” CrsName Market Analysis /CrsName /Course /Courses /Report XML -- 25CSC343 -- Introduction to DatabasesXML NamespacesÆA mechanism to prevent name clashes, likescoping rules.ÆNamespace declaration9Namespace – a symbol, typically a URL;9Prefix – an abbreviation of the namespace;9Actual name (element or attribute) –prefix:name9Declarations/prefixes behave like abegin/end.XML -- 2613

CSC343 -- Introduction to DatabasesDefaultnamespaceExample item xmlns http://www.acmeinc.com/jp#suppliesxmlns:toy “http://www.acmeinc.com/jp#toys” name backpack /name reservedtoy feature keywordnamespace toy:item toy:name cyberpet /toy:name /toy:item /feature /item XML -- 27CSC343 -- Introduction to DatabasesMore NamespacesÆ Scopes of declarations are color-coded: item xmlns “http://www.foo.org/abc”xmlns:cde “http://www.bar.com/cde” name /name New default; feature overshadows old default cde:item cde:name /cde:name /cde:item /feature item xmlns “http://www.foobar.org/”xmlns:cde “http://www.foobar.org/cde” name /name cde:name /cde:name Redeclaration of /item cde;cde overshadows old /item declarationXML -- 2814

CSC343 -- Introduction to DatabasesNamespaces (cont’d)Æxmlns “http://foo.com/bar” doesn’t mean there isa document at this URL: using URLs is just aconvention; a namespace is just an identifier.ÆNamespaces aren’t part of XML 1.0, but all XMLprocessors understand this feature nowÆA number of prefixes have become “standard”and some XML processors might understandthem without any declaration. E.g.,9xs for http://www.w3.org/2001/XMLSchema9xsl for http://www.w3.org/1999/XSL/Transform9Etc.XML -- 29CSC343 -- Introduction to DatabasesDocument Type Definition (DTD)ÆA DTD is a grammar specification for an XMLdocument – you can think of it as a schema.ÆDTDs are optional – don’t need to bespecified; if specified, a DTD can be part ofthe document (at the top); or it can be givenas a URLÆA document that conforms (i.e., parses) w.r.t.its DTD is said to be valid.validÆXML processors are not required to checkvalidity, even if DTD is specified; but they arerequired to test well-formedness.XML -- 3015

CSC343 -- Introduction to DatabasesAttaching a DTD to a DocumentÆDTD specified as part of a document: ?xml version “1.0” ? !DOCTYPE Report [ DTD Report spec ] Name of the DTD Report /Report ÆDTD can also be specified as a standalone thing ?xml version “1.0” ? !DOCTYPE Report“http://foo.org/Report.dtd” Report /Report XML -- 31CSC343 -- Introduction to DatabasesExample DTD ConstructsElement’scontentsÆ !ELEMENT elt-name( contents )/EMPTY/ANY An attr for eltÆ !ATTLIST elt-name pe of attribute Optional/mandatoryÆCan define other things, like macros (calledentities in XML jargon)XML -- 3216

CSC343 -- Introduction to DatabasesDTD LanguageÆ !DOCTYPE root-element [ doctypedeclaration. ] — determines name of rootelement and contains document typedeclarationsÆ !ELEMENT element-name content-model —associates a content model to every elementÆContent models:9EMPTY: no content is allowed9ANY: any content is allowed9(#PCDATA element-name .)*: "mixedcontent", arbitrary sequence of character dataand listed elements;9Deterministic regular expression (cont'd).XML -- 33CSC343 -- Introduction to DatabasesDTD Language: Regular ExpressionsÆDeterministic regular expression overelement names: sequence of elementsmatching the expression choice: (. . .) sequence: (.,.,.) optional: .? zero or more: .* one or more: . XML -- 3417

CSC343 -- Introduction to DatabasesDTD Language: AttributesÆ !ATTLIST element-name attr-name attr-typeattr-default . — declares which attributesare allowed or required in which elementsÆAttribute types:9CDATA: any value is allowed (the default)9(value .): enumeration of allowed values9ID, IDREF, IDREFS: ID attribute values mustbe unique (contain "element identity"),IDREF attribute values must match some ID(reference to an element)9ENTITY, ENTITIES, NMTOKEN,NMTOKENS, NOTATION: just forget these.XML -- 35CSC343 -- Introduction to DatabasesDTD Language:Attribute DefaultsÆ#REQUIRED: the attribute must be explicitlyprovided.Æ#IMPLIED: attribute is optional, no defaultprovided.Æ"value": if not explicitly provided, this value isinserted by default.Æ#FIXED "value": as above, but only this valueis allowed.XML -- 3618

CSC343 -- Introduction to DatabasesDTD Example !DOCTYPE Report [ !ELEMENT Report (Students, Classes, Courses) !ELEMENT Students (Student*) Zero or more !ELEMENT Classes (Class*) !ELEMENT Courses (Course*) !ELEMENT Student (Name, Status, CrsTaken*) !ELEMENT Name (First,Last) Has text content !ELEMENT First (#PCDATA) Empty element, !ELEMENT CrsTaken EMPTY no content !ELEMENT Class (CrsCode,Semester,ClassRoster) !ELEMENT Course (CrsName) !ATTLIST Report Date CDATA #IMPLIED Same attribute in !ATTLIST Student StudId ID #REQUIRED different elements !ATTLIST Course CrsCode ID #REQUIRED !ATTLIST CrsTaken CrsCode IDREF #REQUIRED !ATTLIST ClassRoster Members IDREFS #IMPLIED ] XML-- 37CSC343 -- Introduction to Databases !ELEMENT collection (description,recipe*) !ELEMENT description ANY !ELEMENT recipe (title, ingredient*, preparation, comment?,nutrition) !ELEMENT title (#PCDATA) !ELEMENT ingredient (ingredient*,preparation)? !ATTLIST ingredient name CDATA #REQUIREDamount CDATA #IMPLIEDunit CDATA #IMPLIED !ELEMENT preparation (step*) !ELEMENT step (#PCDATA) !ELEMENT comment (#PCDATA) !ELEMENT nutrition EMPTY !ATTLIST nutrition protein CDATA #REQUIREDcarbohydrates CDATA #REQUIREDfat CDATA #REQUIREDcalories CDATA #REQUIREDalcohol CDATA #IMPLIED AnotherExampleXML -- 3819

CSC343 -- Introduction to DatabasesLimitations of DTDsÆDon’t understand namespaces.ÆVery limited assortment of data types (juststrings).ÆVery weak wrt consistency constraints(ID/IDREF/ IDREFS only).ÆCan’t express unordered contents conveniently.ÆAll element names are global: can’t have oneName type for people and another forcompanies, e.g., !ELEMENT Name (Last, First) !ELEMENT Name (#PCDATA) can’t be in the same DTDXML -- 39CSC343 -- Introduction to DatabasesXML SchemaÆProposed in order to rectify drawbacks of DTDs.ÆAdvantages:9Integrated with namespaces;9Many built-in types;9User-defined types;9Has local element names;9Powerful key and referential constraints.ÆDisadvantages: Unwieldy, much more complexthan DTDsXML -- 4020

CSC343 -- Introduction to DatabasesXML Query LanguagesÆXPath – core query language. Very limited, aglorified selection operator. Very useful, though:used in XML Schema, XSLT, XQuery, many otherXML standards.ÆXSLT – a functional document transformationlanguage. Very powerful, very complicated.ÆXQuery – W3C standard. Very powerful, fairlyintuitive, SQL-styleÆSQL/XML – attempt to marry SQL and XML, part ofSQL:2003XML -- 41CSC343 -- Introduction to DatabasesWhy Query XML?ÆNeed to extract parts of XML documents.ÆNeed to transform documents into differentforms.ÆNeed to relate – join – parts of the same ordifferent documents.XML -- 4221

CSC343 -- Introduction to DatabasesXPathÆAnalogous to path expressions in objectoriented languages (e.g., OQL).ÆExtends path expressions with query facility.ÆXPath views an XML document as a tree9Root of the tree is a new node, which doesn’tcorrespond to anything in the document9Internal nodes are elements;9Leaves are either Attributes, Text nodes, Comments; Or other things that we won't discuss (e.g.,processing instructions, )XML -- 43CSC343 -- Introduction to DatabasesXPath Document TreeRoot of XML treeRoot of XML documentXML -- 4422

CSC343 -- Introduction to Databases and Corresponding Document Æ A fragment of the report document used earlier: ?xml version “1.0” ? !-- Some comment -- Students Student StudId “111111111” Name First John /First Last Doe /Last /Name Status U2 /Status CrsTaken CrsCode “CS308” Semester “F1997” / CrsTaken CrsCode “MAT123” Semester “F1997” / /Student Student StudId “987654321” Name First Bart /First Last Simpson /Last /Name Status U4 /Status CrsTaken CrsCode “CS308” Semester “F1994” / /Student /Students !-- Some other comment -- XML -- 45CSC343 -- Introduction to DatabasesTerminologyÆParent/child nodes, as usual.ÆChild nodes (that are of interest to us) are oftypes text, element, attribute.ÆAncestor/descendant nodes – as usual intrees.XML -- 4623

CSC343 -- Introduction to DatabasesXPath BasicsÆAn XPath expression takes a document tree asinput and returns a multi-set of nodes of the tree.ÆExpressions that start with / are absolute pathexpressions9Expression / – returns root node of XPath tree;9 /Students/Student – returns all StudentStudentelements that are children of Students elements,which in turn must be children of the root;9 /Student – returns empty set (no such childrenat root).ÆThe basic idea here is similar to that of directorypaths.XML -- 47CSC343 -- Introduction to DatabasesMore XPath BasicsÆCurrent (or context node) – exists during theevaluation of XPath expressions (and in otherXML query languages)Æ . – denotes the current node; . – denotes theparent foo/bar – returns all bar-elementsthat arebarchildren of foo nodes, which in turn arechildren of the current node; ./foo/bar – same; ./abc/cde – all cde e-children of abc echildren of the parent of the current node.ÆExpressions that don’t start with / are relative(to the current node).XML -- 4824

CSC343 -- Introduction to DatabasesAttributes, Text, etc.Denotes anattributeÆ/Students/Student/@StudentId – returns allStudentId a-children of Student,Student which are echildren of Students,Students which are children of theroot.Æ/Students/Student/Name/Last/text( ) – returns allt-children of Last e-children of ÆXPath provides means to select other documentcomponents as well.XML -- 49CSC343 -- Introduction to DatabasesBasic Idea and SemanticsÆ An XPath expression is: locationStep1/locationStep2/ locationStep1/locationStep2/ Æ Location step:step Axis::nodeSelector[predicate]Æ Navigation axis:axis9 child, parent – have seen;9 ancestor, descendant, ancestor-or-self, descendantor-self – will see later;This is called full (rather9 some other -- will see later.than abbreviated) syntax.Æ Node selector:selector node name or wildcard; e.g.,9 ./child::Student (we used ./Student, which is anabbreviation)9 ./child::* – any e-child (abbreviation: ./*)Æ Predicate:Predicate a selection condition; e.g.,Students/Student[CourseTaken/@CrsCode “CSC343”]XML -- 5025

CSC343 -- Introduction to DatabasesComplete Set of AxesÆChild — the children of the context nodeÆDescendants — all descendants (children );ÆParent — the parent (empty if at the root)ÆAncestor — all ancestors from the parent to the rootÆFollowing-sibling — siblings to the rightÆPreceding-sibling — siblings to the leftÆFollowing — all following nodes in the document,excluding descendantsÆPreceding — all preceding nodes in the document,excluding ancestorsÆAttribute — the attributes of the context nodeÆNamespace — namespace declarations in contextnodeÆSelf — the context node itselfÆdescendant-or-self — the union of descendant andselfÆancestor-or-self — the union of ancestor and selfXML -- 51CSC343 -- Introduction to DatabasesAxisDirectionsXML -- 5226

CSC343 -- Introduction to DatabasesNode TestsÆTesting by node type:9text() — chardata node;9comment() — comment node;9processing-instruction() — processinginstruction node;9node() — any node (not including attributesand namespace declarations);ÆTesting by node name:9Name — nodes with that name9* — any nodeXML -- 53CSC343 -- Introduction to DatabasesEssential PredicatesÆ[attribute::name "flour"]: test equality of an attributeÆ[attribute::name! "flour"]: test inequality of anattributeÆ[attribute::amount '0.5' and attribute::unit 'cup']: testtwo things at once (also or)Æ[position() 2]: test position among siblingsÆ[attribute::amount '0.5']: a syntax errorÆ[attribute::amount<'0.5']: a useless test oflexicographical ]: whatyou meant to write instead!An entire location path may be used as a predicateÆ[attribute::amount]: the node has an amount attributeÆ[descendant::ingredient]: the node has a nestedingredientXML -- 5427

CSC343 -- Introduction to DatabasesXPath SemanticsThe meaning of the expression locStep1/locStep2/ isthe set of all document nodes obtained as follows:9Find all nodes reachable by locStep1 from thecurrent node;9For each node N in the result, find all nodesreachable from N by locStep2; take the union ofall these nodes;9For each node in the result, find all nodesreachable by locStep3,locStep3 etc.;9The value of the path expression on a documentis the set of all document nodes found afterprocessing the last location step in theXML -- 55expression.CSC343 -- Introduction to Databases More Generally ÆlocationStep1/locationStep2/ means:9Find all nodes specified by locationStep19For each such node N: Find all nodes specified by locationStep2using N as the current node Take union9For each node returned by locationStep2 do thesame using locationStep3, ÆlocationStep axis::node[predicate]9Find all nodes specified by axis::node9Select only those that satisfy predicateXML -- 5628

CSC343 -- Introduction to DatabasesMore Navigational PrimitivesÆSecond CrsTaken child of first Student child ofStudents:Students/Students/Students Student[1]/Student CrsTaken[2]CrsTakenÆAll last CourseTaken elements within eachStudent element:/Students/Student/CrsTaken[last( )]ÆAll href attributes in cite elements in the first 5sections of an article document:child::section[position() 6] / descendant::cite /attribute::hrefXML -- 57CSC343 -- Introduction to DatabasesWildcardsÆWildcards are useful for unknown document structures.ÆThe // wildcard descends down any number of levels(including 0):9 //CrsTaken – all CrsTaken nodes under the root;9 Students//@Name – all Name attribute nodes underthe elements Students, who are children under thecurrent node.ÆNote: ./Last and Last are same; but .//Last and //Lastare different.ÆThe * wildcard: * – any element:Student/*/text() @* – any attribute: Students//@*XML -- 5829

CSC343 -- Introduction to DatabasesSelection PredicatesÆRecall: Location step Axis::nodeSelector[predicate]ÆPredicate:9XPath expression const built-in function XPath expression (equality predicate);9XPath expression (returns false if result is empty);9 built-in predicate;9 a Boolean combination thereof;ÆAxis::nodeSelector[predicate] Axis::nodeSelectorbut contains only the nodes that satisfy predicate.ÆBuilt-in predicates include ones for string matching,set manipulation, etc. Built-in function include largeassortment of functions for string manipulation,aggregation, etc.XML -- 59CSC343 -- Introduction to DatabasesXPath Queries – ExamplesÆStudents who have taken CSC343://Student[CrsTaken/@CrsCode “CSC343”]ÆComplex example://Student[Status “U3” and starts-with(.//Last,“A”)and contains(concat(.//@CrsCode),“ESE”)and not(.//Last .//First) ]ÆAggregation: sum( ), count( )//Student[sum(.//@Grade) div count(.//@Grade) 3.5]XML -- 6030

CSC343 -- Introduction to DatabasesXPath Queries cont’dÆTesting whether a subnode exists:9//Student[CrsTaken/@Grade] – students whohave a grade (for some course)9//Student[Name/First or CrsTaken/@Semesteror Status/text() “U4”] – studentswho have either a first name or have taken acourse in some semester or have status U4ÆUnion operator, :9//CrsTaken[@Semester “F2001”] //Class[Semester “F1990”]union lets us define heterogeneous collections ofnodes.XML -- 61CSC343 -- Introduction to DatabasesXQuery – XML Query LanguageÆIntegrates XPath with earlier proposed querylanguages: XQL, XML-QLÆSQL-style, not functional-styleÆ2004: XQuery 1.0XML -- 6231

CSC343 -- Introduction to DatabasesAn ExampleXML -- 63CSC343 -- Introduction to DatabasesSomeQueriesXML -- 6432

CSC343 -- Introduction to Databasestranscript.xml Transcripts Transcript Student StudId “111111111” Name “John Doe” / CrsTaken CrsCode “CS308” Sem “F97” Gr “B” / CrsTaken CrsCode “MAT123” Sem “F97” Gr “B” / CrsTaken CrsCode “EE101” Sem “F1997” Gr “A” / CrsTaken CrsCode “CS305” Sem “F1995” Gr “A” / /Transcript Transcript Student StudId “987654321” Name “Bart Simpson” / CrsTaken CrsCode “CS305” Sem “F1995” Gr “C” / CrsTaken CrsCode “CS308” Sem “F1994” Gr “B” / /Transcript cont’d XML -- 65CSC343 -- Introduction to Databasestranscript.xml (cont’d) Transcript Student StudId “123454321” Name “Joe Blow” / CrsTaken CrsCode “CS315” Sem “S97” Gr “A” / CrsTaken CrsCode “CS305” Sem “S96” Gr “A” / CrsTaken CrsCode “MAT123” Sem “S96” Gr “C” / /Transcript Transcript Student StudId “023456789” Name “Homer Simpson”/ CrsTaken CrsCode “EE101” Sem “F1995” Gr “B” / CrsTaken CrsCode “CS305” Sem “S1996” Gr “A” / /Transcript /Transcripts XML -- 6633

CSC343 -- Introduction to DatabasesXQuery BasicsXQueryÆ General structure:expressionFORvariable declarationsWHERE conditionRETURN documentcommentÆ Example:(: students who took MAT123 :)See next slideFOR t criptWHERE t/CrsTaken/@CrsCode “MAT123”RETURN t/StudentÆ Result: Student StudId “111111111” Name “John Doe” / Student StudId “123454321” Name “Joe Blow” / XML -- 67CSC343 -- Introduction to DatabasesXQuery Basics (cont’d)Æ Previous query doesn’t produce a well-formed XMLdocument; the following does:Query inside StudentList XML{FOR t IN doc(“transcript.xml”)//TranscriptWHERE t/CrsTaken/@CrsCode “MAT123”RETURN t/Student} /StudentList Æ FOR binds t to Transcript elements one by one, filtersusing WHERE, then places Student-childrenas eStudentchildren of StudentList using RETURN.XML -- 6834

CSC343 -- Introduction to DatabasesDoc Restructuring with XQueryÆ Reconstruct lists of students taking each class usingthe Transcript records:FOR c IN N ClassRoster CrsCode { c/@CrsCode}Sem { c/@Sem} { FOR t IN doc(“transcript.xml”)//TranscriptWHERE t/CrsTaken/[@CrsCode c/@CrsCodeand @Semester c/@Sem]RETURN t/Student ORDER BY t/Student/@StudId} /ClassRoster ORDER BY c/@CrsCodeXML -- 69CSC343 -- Introduction to DatabasesDocument Restructuring (cont’d)Æ Output elements have the form: ClassRoster CrsCode “CS305” Sem “F1995” StudentStudId “111111111”Name “John Doe” / StudentStudId “987654321”Name “BartSimpson”/ /ClassRoster Æ Problem: the above element will be output twice –John Doe’sonce when c is bound to CrsTaken CrsCode “CS305” Sem “F1995” Grade “A” / Bart Simpson’sand once when it is bound to CrsTaken CrsCode “CS305” Sem “F1995” Grade “C” / Æ Note: grades are different – distinct( ) won’t eliminatetranscript records that refer to same class!XML -- 7035

CSC343 -- Introduction to DatabasesDocument Restructuring (cont’d)ÆSolution: instead ofFOR c Document onnext slideFOR c IN doc(“classes.xml”)//Classclasses.xmlwhere clas

CSC343 -- Introduction to Databases Weeks 7 and 9: XML Data and Query Processing Semistructured Data, HTML XML and DTDs XPath, XQuery XML -- 2 CSC343 -- Introduction to Databases Hypertext ÆMost human knowledge exists today in document format (books etc.) ÆNeed technologies that s

Related Documents:

CSC343 -- Introduction to Databases Examples of Keys (Internal) multi-attribute (n-ary) key PER SON DateOfBirth urname FirstName Address The Entity-Relationship Model -- 6 CSC343 -- Introduction to Databases Examples of Keys foreign, multi-attribute key (aka weak entity set) UNIVERSITY Nam

Gestation, Length, and Size of Casket Age of baby at death 6 weeks 7 weeks 8 weeks 9 weeks 10 weeks 11 weeks 12 weeks 13 weeks 14 weeks 16 weeks 18 weeks 20 weeks

Preliminary English B1 Threshold 4.0 Pre-Intermediate 3.5 KET Key English 3.0 A2 Elementary Waystage 2.5 2.0 Beginner 1 - 1.5 A1 0 - 0.5 Breakthrough 12-15 weeks 12-15 weeks 12-15 weeks 12-15 weeks 12-15 weeks 12-15 weeks 8-10 weeks 9-12 weeks 9-12 9-12 weeks 9-

Management Best in Class Cycle Time Global change (e.g. prime rate) 5 weeks 3 weeks 4 weeks 1-2 days Product specific change 4 weeks 3.5 weeks 4 weeks 1-2 days Creation and rollout of new disclosure 5 weeks 4 weeks 5 weeks 1 week Global Change Management Resources (ranging from 5-10 people teams) 8 teams 5 teams 6 teams 2 and automated .

Seven Databases in Seven Weeks, Second Edition A Guide to Modern Databases and the NoSQL Movement This PDF file contains pages extracted from Seven Databases in Seven Weeks, Second Edition, published by the Pragmatic Bookshelf. For more information or to purchase a paperback or

14 databases History 183 databases ProQuest Primary Sources available for: Introduction ProQuest Historical Primary Sources Support Research, Teaching and Learning. Faculty and students are using a variety of resources in research, teaching and learning – including primary sources,

Fifth Grade Social Studies Curriculum Guide First Nine Weeks Second Nine Weeks Weeks Topics Content Weeks Topics Content Standard 5. 1 TN Geography . Sequoyah *Review & Unit Test Third Nine Weeks Fourth Nine Weeks Weeks Topics Content Weeks Topics Content 1-2 World War I Standards 5.10-13, 5.49: Central & Allied Powers .

graded readers end at around the 3,000 word-family level. Note that this figure differs from Nation (2001) who considered this level to con- tain only the first 2,000 word families. In Table 3, the mid-frequency vocabulary con-sists of around 6,000 word families, which when added to high-frequency vocabulary adds up to 9,000 word families. The reason for making the arbitrary cut-off point .