Working With Parsed and nonparsed Data in XML

By Daniel Imbellino
Feb 28,2013

A CDATA (character data) section in an XML document specifies to a browsers preprocessor not to parse data contained within it. A CDATA section of an XML document starts with a “”, while containing the content you want not to be parsed in between the two sets of tags like so:

<html>
<head>
<title></title>
</head>
<body>
<![CDATA[
<html>
<head>
<title></title>
</head>
<body>
<h1></h1>
<p></p>
</body>
</html>
]]>
</body>
</html>

While this might seem confusing, what we just did was embed HTML code into our document so that it will be displayed this way in a browser window. You can see an HTML document has been encased in our CDATA section and therefore it will not be parsed by the browser. This is an effective way to show actual markup in an XML web page or document. A CDATA section is also used in XHTML 1.0 Strict documents to represent HTML characters in a web page. By default, the less-than < and greater-than >, as well as the ampersand & symbols are strictly illegal in XML and XHTML documents. If you were to create an XHTML 1.0 Transitional web page, you would not be able to use a CDATA section in it since the Transitional form of XHTML is not true XHTML. In the case of a Transitional XHTML document you would need to use character or entity references instead.

In XHTML you could display markup characters using entity references. You could display the less-than symbol using “&lt;”, without the quotes, or greater-than using “&gt;”. To display an ampersand use the “&amp;” entity reference, again, without the quotes. You can’t use an ampersand by default because it declares the beginning of an entity reference in an XML document. For single quotes you can use “&apos;” again without the quotes, and for double quotes &quote;.

PCDATA (Parsed Character Data) on the other hand signifies that tags inside of markup text will be processed by the parser (web browser). That is, all text is treated as data to be processed. Markup in a CDATA section is virtually ignored. Sometimes it’s useful to use a CDATA section in an XML document who’s content is encased in scripts. You should encase scripts that use the greater-than &gt;, less-than &lt; or &amp; entity references, since they are illegal in XML by default.