Declaring XML Rules and Properties with a DTD

By Daniel Imbellino
Feb 28, 2013

You use a DTD in XML to declare your elements, attributes, and attribute values. You also use a DTD to help define the rules and proper structure of your XML language. XML that does not have a valid DTD or Schema, is not considered valid XML.

You can use any text editor you wish, just be sure to use a descriptive name for your XML language, and be sure to reference it properly from your XML files. Also, remember to use the “.dtd” file extension when naming your Document Type Definitions.

Defining Elements in a DTD

You define an element like this:
<!ELEMENT tag (other-element)>

The “<!ELEMENT” portion identifies that we are declaring a new element, we then provide a name for our element, replacing the “tag” with our new name. We then use parenthesis to describe what other element or elements our new element will contain.

In the case of our fictitious XBOX markup language (discussed in another tutorial), we could define our root element “XBOX” as:
<!ELEMENT xbox (model, game)>

We created a sequence of elements that the root element “xbox” must contain, and in the exact order specified. This would be fine if we were using the XML structure specified below. Also, notice we didn’t include the other elements since they are to be used and contained within our “game” element.

<?xml version=”1.0” standalone=”no”?>

<!DOCTYPE xbox Public “-//Microsoft-xbml//DTD xbox//EN//” “http://www.microsoft.com/xbml/xbml.dtd”>

<xbox>
<model type=”360” />
<game>
<title>FarCry Instincts Predator</title>
<isbn id=”BN76932J10” />
<gender>First-Person Shooter</gender>
</game>
</xbox>

But what if we didn’t want to specify an order for our child elements to appear within the root element, in this case “xbox”, or what if there were certain elements we didn’t want to use?

We could solve this problem by adding an asterisk * to the end of our elements declaration in our DTD: <!ELEMENT xbox (model, game)*>

Now we can use these elements in any order.

So far, we haven’t defined any rules for our other elements, just our root element. Notice our “model” tag, which is considered an “open-element” since it ends with a forward slash and a greater-than sign. The model tag will contain no other elements and will contain only text, so we must define it as so:

<!ELEMENT model (#PCDATA)>

Here we specified that our “model” element can contain textual data with the use of the “#PCDATA” (Parsed Character Data) property. This allows us to enter the textual characters “360” in our markup as shown below:

<model type=”360” />

Also notice our “game” element, which should only contain the “title”, “isbn”, and “gender” elements:

<!ELEMENT game (title, isbn, gender)*>

Now we have defined that those 3 elements are child elements of our “game” element, and with the use of an asterisk we can display those elements in any order.

We defined that the elements can be used in any order, but we haven’t defined how many times those elements can be used in a given XML document. Now we can incorporate “modifying-symbols” to declare how many times a single element can be declared. In the case of our “model” element, we probably only want it to appear once in a document. We can use the addition modifier “+” to declare the use of our “model” element atleast once in a single XML document as shown below:

<!ELEMENT xbox (model+, game)*>

Likewise, we probably want to be able to declare multiple “game” elements in our XML documents, and we can do this using the asterisk modifier like this:

<!ELEMENT xbox (model+, game*)*>

We’ve now specified that our “model” element must occur atleast once, and the “game” element can appear multiple times, as many as we wish within our root “xbox” element.

Now we probably would want our “title”, “isbn”, and “gender” elements to appear only once per “game” element in our documents. We can define each of these elements to appear only once using the question mark modifier like this:

<!ELEMENT game (title?, isbn?, gender?)*>

Now these elements can appear only once per use with the “game” element, and within any order since we put an asterisk at the end of our declaration.

Now we need to define the properties for our “title”, “isbn”, and “gender” elements. <!ELEMENT title (#PCDATA)>
<!ELEMENT isbn (#PCDATA)>
<!ELEMENT gender (#PCDATA)>

Here we specified that each of these 3 elements can contain Parsed Character Data, meaning they can contain textual data that will parsed by the browsers preprocessor.

And here’s our DTD so far:

<!ELEMENT xbox (model+, game*)*>
<!ELEMENT model (#PCDATA)>
<!ELEMENT game (title?, isbn?, gender?)*>
<!ELEMENT title (#PCDATA)>
<!ELEMENT isbn (#PCDATA)>
<!ELEMENT gender (#PCDATA)>

We’ve now defined the rules and properties for our XBML language. We could now refer to the DTD above in our XML documents:

<?xml version=”1.0” standalone=”no”?>

<!DOCTYPE xbox Public “-//Microsoft-xbml//DTD xbox//EN//” “http://www.microsoft.com/xbml/xbml.dtd”>

<xbox>
<model type=”360” />
<game>
<title>FarCry Instincts Predator</title>
<isbn id=”BN76932J10” />
<gender>First-Person Shooter</gender>
</game>
</xbox>

There are several different types of “content-models” you can define when declaring your elements. You can define your elements to contain Parsed Character Data (#PCDATA - some form of textual data), as “EMPTY” (an empty element is often used for elements that have attributes that will use binary data, such as an image file for instance), “ANY” (a mix of #PCDATA and binary data), or another element.

Here’s an example of each:

<!ELEMENT image EMPTY>
<image file=”example-image.png” />
<!ELEMENT title (#PCDATA)>
<title>Example Title</title>
<!ELEMENT variable ANY>
<variable>This element can contain any type of content</variable>
<!ELEMENT game (title, isbn, gender)>
<game>
<title></title>
<isbn></isbn>
<gender></gender>
</game>

Remember the modifying symbols we used earlier? They were the asterisk *, question mark ?, and addition symbol +. The asterisk signifies that an element can occur zero times or more. The question mark signifies that an element can occur zero or one time in a given document, while the addition symbol signifies that an element must occur atleast once.

Recap:
To specify a sequence of elements:
<!ELEMENT element (element-one, element-two, element-three, etc)>
To specify elements to be used in any order:
<!ELEMENT element (element-one, element-two, element-three)*>
To specify a choice among elements being declared you use a vertical bar to separate elements:
<!ELEMENT xbox (model | game | etc)>

Here we can only select one of the given elements to use with our xbox element at a given time. The vertical bar is a separator. If you choose game, then you could not use model for instance.

We can also specify that an element has a multiple of child-elements using a mix of vertical bars and commas, like so:

<!ELEMENT xbox ((game | model), title, isbn, gender)>

Here we specified that the “xbox” element must contain 4 elements, either the “game” or “model” element, and all three of its following elements.

We can also declare an element to contain mixed content, such as Parsed Character Data and other elements like so:

<!ELEMENT xbox (#PCDATA | childone, childtwo, childthree, etc)>

This would mean our “xbox” element could contain textual data or one of a set of elements.

Note: To enter a vertical bar "|" in your text editor use ALT+124, then release alt. This will create your vertical bar, say in Windows notepad. You need to use a standard numeric keypad in order for this to work.

Defining XML Attributes in a DTD