
A Quick Primer on XML
XML is not complex; there are only a very small number of rules defined for it. These rules define what is minimally
required for a document to be considered "well-formed". Well-formed is required for all XML documents. Any
processing must be halted upon encountering a violation of the constraints.
Every XML document should begin with the XML declaration. When present, it must be the first line in the
document.
The declaration has the following attributes.
- Version: describes the XML version to be used in the document. So far, the only version is
“1.0”. Required.
- Standalone: specifies whether there are external markup declarations. Valid values are “yes” and
“no”. Optional.
- Encoding: tells which type of character encoding is used. Optional.
<?xml version=”1.0” encoding=”UTF-8” standalone=”yes”?>
The only types of encoding required for support by an XML processor are “UTF-8” and “UTF-16”.
Elements
An element is the proper name for the XML tags and the content contained within.
- An element tag begins with a “<” and ends with a “>”.
- An element has a start tag, content, and an end tag.
- An element might be empty and contain no content. There is a special syntax for specifying an
empty element, the closing character is preceded by the “/” character.
<elementname>element content</elementname> <!-- an element with content -->
<anotherelementname /> <!-- an element without content -->
Element names must begin with a letter or an underscore, and then any sequence of characters (with the exception of
the space).
XML is case sensitive.
Elements may contain only text.
<?xml version="1.0"?>
<ClassMaterial>Complete XML</ClassMaterial>
Elements may only have children.
<?xml version="1.0"?>
<ClassMaterial>
<Title>Complete XML</Title>
</ClassMaterial>
Elements may contain both text and children. These elements would be known as “mixed content”. Mixed content
is generally avoided.
<?xml version="1.0"?>
<ClassMaterial>Complete XML
<Chapter>A History Of XML
<Objective>There are many objectives for this chapter
<Point>To see the purpose of XML</Point>
<Point>To learn the history of XML</Point>
<Point>To see the building blocks of XML syntax</Point>
<Point>To learn the rules for well-formed XML documents.</Point>
<Point>To see what some of the surrounding technologies are</Point>
</Objective>
</Chapter>
</ClassMaterial>
The first element in a document is known as the root element.
Attributes
Elements may have attributes. Attributes provide more information about the element and are located within the
start tag of an element.
- Attribute values are contained in quotes.
- The attribute “xml” is reserved.
<elementname attributename=”attribute value”>element content </elementname>
Attribute names follow the same rules as element names.
You can think of attributes with reference to HTML. Remember the HTML body tag? It has the attribute of
bgcolor for example.
<HTML><HEAD></HEAD>
<BODY BGCOLOR=”white”>
</BODY>
</HTML>
Attributes are typically used to provide more information about the element itself, and are usually simple data
values. A good example would be an id.
<?xml version=”1.0”?>
<classListing>
<student id=”555-53-3242”/>
<student id=”443-34-2344”/>
<student id=”325-22-3445”/>
</classListing>
Attributes do not affect whether an element is text only, child only, or mixed.
Elements may have many attributes.
Elements may not have more than one of the same attribute.
Taking a look at our class material example from before. We could use attributes to assign a title to our chapter and
class material.
<?xml version="1.0"?>
<ClassMaterial Title="Complete XML" Author=”Gina McGhee”>
<Chapter Title="The XML Saga" PageCount=”3”>
<Objective>
<Point>To see the purpose of XML</Point>
<Point>To learn the history of XML</Point>
<Point>To see the building blocks of XML syntax</Point>
<Point>To learn the rules for well-formed XML documents.</Point>
<Point>To see what some of the surrounding technologies are</Point>
</Objective>
<Overview>XML has a long history. Its development came about as a result of
weaknesses and deficiencies of other markup languages. We will be examining
some of these weaknesses, and indeed looking to the past to get motivation for
the use of XML in the future. Also, we will get our first glimpse into the
building blocks of XML and see some of the surrounding technologies.</Overview>
<Section Title="Goals of XML">
<Point>Straightforwardly usable over the Internet.</Point>
<Point>Support a wide variety of applications.</Point>
<Point>Designs may be prepared quickly.</Point>
<Point>Documents should be easy to create.</Point>
<Point>Terseness of minimal importance.</Point>
</Section>
<Section Title="What is XML?">
<Point>Text formatting
<Subpoint>Formatting markup</Subpoint>
<Subpoint>Generalized markup</Subpoint>
</Point>
<Point>The goal: to digitally represent documents.</Point>
</Section>
<Summary>
<Point>XML is a technology important to learn.</Point>
</Summary>
</Chapter>
</ClassMaterial>
Comments
XML comments begin with “<!--" and end with “-->”.
<!-- This is an XML comment -->
Character Entities
XML character entities are similar to HTML entities, and are used to include reserved characters, such as “<” within
a document.
These characters are reserved since the parser believes these to be parsed as XML elements.
<?xml version=”1.0”?>
<Math>
<Statement>6 < 7 </Statement> <!-- 6 is less than 7 -->
</Math>
Note in the above example the parser will look for the matching close element tag. This will produce an error.
There are some built-in XML character entities.
<?xml version=”1.0”?>
<Math>
<Statement>6 < 7 </Statement> <!-- 6 is less than 7 -->
</Math>
CDATA Sections
The case may arise that a significant chunk of data contains markup that may have reserved characters. CDATA
sections are used to delineate this piece of data.
CDATA sections are used to enclose large amounts of text containing reserved markup characters that are not to be
validated as XML, or in the case of variable data over which there is no control.
- Images
- Script blocks
- User input
A CDATA section begins with the special characters “<![CDATA[“ and ends with “]]>”
<![CDATA[
<script language=”javascript”>
var myVar;
function myFunction()
{
return true;
}
</script>
]]>
CDATA sections may occur anywhere character data may occur in an XML document. They may not appear
outside the root element.
<?xml version=”1.0”?>
<ClassMaterial Title=”Gardening for Beginners”>
<CoverPhoto><![CDATA[saji j3<<<k34<>>>>>>uxf934f9u5””/98r]]></CoverPhoto>
<Chapter Number=”1”>We all love to garden. It makes us feel closer to Mother
Earth.
</Chapter>
</ClassMaterial>
Processing Instructions
Processing instructions are used to pass information to the application that is processing the XML document.
<?PITarget instruction:parameters ?>
- A processing instruction begins with a “<?” and ends with “?>”.
- The PITarget is the application which will process the instruction.
- The instruction would be the method to be executed by the target application.
<?myWordProcessor COUNT:pages?>
The Notation mechanism may be used to formally declare a PITarget.
Due to the nature of PIs, they are implementation specific, and parsers that do not recognize them are free to ignore
them.
There is an example of a useful PI; it associates a style sheet with an XML document.
<?xml-stylesheet type=”text/xsl” href=”myStylesheet.xsl”?>
All Together Now!
<?xml version="1.0" standalone="yes" ?>
<patients>
<![CDATA[
<script language="javascript">
function displayRecord(admissionID)
{
document.text1.value=admissionID;
document.text2.value=patientRecord(admissionID).name.value;
}
</script>
]]>
<!-- Patient records are to be stored here -->
<patientRecord admissionID="553534234">Family History of heart disease
<name>John Smith</name>
<insurance>Jones & Barnes</insurance>
<condition>High blood pressure</condition>
<condition>Heart disease</condition>
<medication>Aspirin</medication>
</patientRecord>
<patientRecord admissionID="533453455"> Admitted with stomach cramps
<name>Jill Toomey</name>
<insurance>Jones & Miller</insurance>
<condition>Diabetes</condition>
<medication>Insulin</medication>
</patientRecord>
</patients>
XML Primer
Table of Contents
Courseware
Training Resources
Tutorials
Entity
|
Represents
|
Character representation
|
"
|
quotation marks
|
"
|
'
|
apostrophe
|
'
|
<
|
less-than sign
|
<
|
>
|
greater-than sign
|
>
|
&
|
ampersand
|
&
|
|
Copyright (c) 2008. Intertech, Inc. All Rights Reserved. This information is to be used
exclusively as an online learning aid. Any copying is strictly prohibited.
Services