Some notes about XML
XML document
XML declaration
<?xml version='1.0' enconding='ASCII' standalone='yes' ?>
- set version as 1.0;
- encoding specifies the charset to use (defaults to UTF-8);
- standalone tells if there's an external DTD to be read.
XML declaration is not required, but if it present it must be the first line of the document.
Comments
<!-- comment -->
Processing instructions
<? target and parameters ?>
Processing instructions pass information to the parser of the document.
Tags
<tag> ... </tag>
<tag/>
- exactly one root element exists;
- can exist a parent/children relationship (one to many) between the tags;
- can contain mixed content (text and other tags).
<tag attribute = "value"> ... </tag>
Tags can have attributes; values must be quoted.
<![CDATA[ raw text ]]>
Can contain CDATA sections for raw content.
Document Type Definition
DTD lists all elements, attributes and entities the document uses and the contexts in which they are used.
Can be included in the prolog of the XML document as an external resource:
<!DOCTYPE <element> SYSTEM "<url of the DTD>">
or inline:
<!DOCTYPE person [ <!ELEMENT ... > <!ELEMENT ...> ... ]>
DTD inclusion must be after the XML declaration and before the root element.
Elements
<!ELEMENT <name> (content)>
Available content
types
#PCDATA
Parsed character data.
- Child element
Another element or an instance of the same element.
- Sequence
A list of two or more comma separated children. Children can have modification suffixes (
?
,*
,+
with regexp like meaning and choices are supported (|
meaningOR
).()
parenthesis are supported as well.The list is ordered.
- Mixed content
<!ELEMENT <name> (#PCDATA|child 1|...|child n)*>
Text and child elements are mixed.
#PCDATA
must come first in the mixed content declaration.child 1
, …,child n
are the elements and each element must have its own definition within the DTD. The operator (*) must follow the mixed content declaration if children elements are included. The#PCDATA
and children element declarations must be separated by the|
operator. - Nothing (empty)
<!ELEMENT <name> EMPTY>
An empty element has only attributes.
- Anything
<!ELEMENT <name> ANY>
Attributes
<!ATTLIST <element> <attribute> <type> <default>>
Multiple attributes can be declared in a single ATTLIST
.
Type
- CDATA
Any string of text.
- NMTOKEN
XML name token.
- NMTOKENS
A space separated list of one or more
NMTOKEN
. - Enumeration
A
|
separated list of all possibile values for the attribute. - ID
An XML name unique within the XML document.
- IDREF
An XML name that refers to the
ID
attribute of another element. - IDREFS
A space separated list of one or more
IDREF
. - ENTITY
The name of an unparsed entity declared elsewhere in the DTD.
- ENTITIES
A space separated list of one or more
ENTITY
. - NOTATION
The name of a notation declared in the document DTD.
Entities
<!ENTITY <name> (value)>
Defined in the DTD, is referenced as: &<entity name>
. Value will be
replace only in the XML document.
<!ENTITY <name> SYSTEM (URL)>
Included and parsed from external URL.
Non parsed entities
<!ENTITY <name> SYSTEM (URL) NDATA (notation)>
NDATA
specifies a notation defined elsewhere in the DTD:
<!NOTATION <notation> SYSTEM (identifier)>
These entites are included using their name (without &
).
Parameter entities
<!ENTITY % <name> (value)>
Referenced as: %<name>
, value will be replaced only in the
DTD. Can be redefined (but internal DTD definition takes precedence).
Can be included from an external DTD:
<!ENTITY % <name> SYSTEM (URL)>
Namespaces
Namespaces distinguish between elements with the same name but different meanings, by assigning a unique URI to each element. Since the URI is just a formal identifier, it doesn't need to be a valid one.
Prefixes
xmlns:<prefix>="<URI>"
Used because URIs seldomly are valid XML names. Usually declared in the root element for convenince. The DTD, if used, must declare elements along with their prefix.
Usage: <<prefix>:element ... >
Default namespace
Defined by attaching an xmlns
attribute with no prefix to the top
element; all unprefixed descendants will be part of that namespace.
Internationalization
<?xml version="1.0" standalone="yes" encoding="<encoding>"?>
The encoding declaration tells in which character set the document is written.
Text declaration
<?xml version="1.0" encoding="<encoding>"?>
Used in document fragments with external parsed entities.
xml:lang
Used at element level, specifies the enconding of the element; must be declared as attribute.
XPath
Location paths
Identify a set of nodes in the document.
Root node
<xsl:template match="/">
Selects the root node of the document.
Child element
<xsl:template match="[name]">
Selects all child elements of the node with a specified [name]
.
Attribute
<xsl:value-of select="@[attribute]">
Selects the [attribute]
value of the current node.
Comment
<xsl:template match="comment()">
Select any comment or text node child of the current node. Each comment is a separate node.
Text
<xsl:template match="text()">
Text node contain the maximum continguous text.
Processing-instructions
<xsl:template match="processing-instruction([target])">
Selects all processing instructions children of the current
node. target
is optional.
Wildcards
** | [namespace]:*
Matches any node regardless of name. Does not match attributes, processing-instructions, text or comment nodes.
*node()
Matches any node regardless of name and type.
*@*
Matches all attribute nodes.
Multiple matches
Combine location paths using |
(or)
Compound location paths
- Forward slash
/
Creates a location paths in a filesystem like fashion. If the paths starts with
/
is absolute, otherwise relative to the context node. - Double forward slash
//
Selects the context node and all its descendants.
- Doble period
..
Selects the parent of the current node.
- Single period
.
Selects the context node.
Predicates
< ... /.../node[attribute='value']/.../>
Selects the node(s) whose attribute match 'value'.Supports the usual set of relational operators.
Unabbreviated location paths
Allow a more fine grained selection of nodes:
[axis]::[node]
axis
can be:
- self
- ancestor
- following-sibling
- preceding-sibling
- following
- preceding
- namespace
- descendant
- anchestor-or-self
X-Links
Simple link
< ... xlink:type = "simple" xlink:href = <URI> >
Defines a one way connection between the document and another
resource. URI
need not to be an URL.
Semantic
xlink:title
Describes the remote resource.
xlink:role
Contains an URI that indicates the meaning of the link.
Behaviour
xlink:show
In which context the resource should be displayed (new
, replace
,
embed
, other
, none
).
xlink:actuate
When the link should be followed (onLoad
, onRequest
, other
,
none
).
Extended links
< ... xlink:type = "extended">
A collection of resources and a collection of paths between them.
Locators
< ... xlink:type = "locator" xlink:href = <URI> >
Locate a particular resource and provides additional semantic
attributes: xlink:label
, xlink:title
, xlink:role
Arcs
< ... xlink:type = "arc" xlink:from = <xlink:label> xlink:to = <xlink:label> >
Represent a path between two resources: xlink:from
and xlink:to
are the endpoints, xlink:label
matches the label associated to one of the locators of the extended link.
Provide extended attributes:
title
: human readable arc description;arcrole
: an absoluteURI
identifying the nature of the arc.
Local resources
< ... xlink:type = "resource" xlink:label = <xlink:label> >
Represents a resource contained inside the extended link;
xlink:label
matches the label of the locator in the extended link.
Titles
<... xlink:type = "title">
Provide a title to the whole extended link.
X-Links & DTDs
All xlink
attributes used must be declared in a DTD.
Base URL
xml:base = "<URL>"
Defines a base URL for relative URIs. URL can be relative and resolved against the base URL of the containing entity.
XPointers
Identify a location inside an XML document.
XPointers in URLs
xpointer(<XPath expression>)
Use one xpointer after the other(s) to specify a backup location. Not necessarily refers to a single element.
XPointers in Links
Point only to XML documents ans can be used in internal links as well.
Shorthand pointers
...#<id>
<id>
is an attribute declared to have an ID type in the document
DTD.
Child sequences
xpointer(/child::*[position( ) = 1]/child::*[ position( ) = 2]/child::*[ position( ) = 3])
Selects the third child of the second child of the root element of the document.
Namespaces
xmlns(...)[,xmlns(...),...] xpointer(...)
Namespace prefixes to bind must be specified before the XPointer part. Xpointer handles namespaces on its own.
Points
Zero dimensional locations inside a node.
xpointer(start-point(//<node>))
Identifies the first point inside a node (after the >
character of
the node's tag).
xpointer(end-point(//<node>))
Identifies the last point inside a node (before the <
character of
the node's tag).
Ranges
A span of parsed characters between two points.
range()
Takes an XPointer expression that returns a location set and returns a range exactly covering the location (tags included, one range for each location in the set).
range-inside()
Takes an XPointer expression that returns a location set and returns a range exactly covering the location (tags excluded, one range for each location in the set).
range-to()
Evaluated in respect to a context node, takes one location and returns
one or more ranges. start-point
is the starting point of the context
node, end-point
is the ending point of the argument.
string-range()
Operates on the text of the document (tags are stripped); takes an XPointer expression indentifying locations and a substring to match against the text of the location. Returns one range for each non-overlapping match exactly covering the matched string.
Relative XPointers
here()
Refers to the node containing the XPointer or the element containing the node if the node is a text node.
origin()
Refers to the node from which the user started traversal.
XInclude
Include element
<xi:include>...</xi:include>
Attributes
href
Points to the document to include. The document is assumed to be well-formed.
parse="text"
Allows inclusion of a plain text document (thus not well-formed).
encoding="<encoding>"
Specifies the encoding of the document - defaults to UTF-8.
accept="<MIME type>"
Specifies the accepted MIME type of the document.
accept-language="<lang>"
Specifies the accepted language for the document.
xpointer="<xpointer>"
Allowed only when parsing xml documents (parse="xml"
), indicates
which part of the document referenced by the href
attribute should
be included (if href is absent refers to the current document).
Fallback
<xi:fallback>...</xi:fallback>
Alternate content if the document can't be loaded (only one fallback is allowed).