Learn How to Properly Format Your HTML Documents - Including DTD's, Character Encoding, and Namespaces

By: Daniel Imbellino
Updated: Feb 28, 2013

In this tutorial we are going to talk about the proper way to format your web pages for the world wide web. While many may feel this is not an important subject, the search engines and "user agents" (mainly browsers) disagree. Believe it or not, search engines do infact grade your code (html, xhtml, etc) in order to help determine the quality of your webpages, and overall website. They look not only at whether or not your code is sufficient, but at your actual content that is placed in your webpages. Search engine spiders don't just index your html, they actually index your entire webpage! Google seems to literally ingest it! That being said, simply put, bad grammar, improper formatting, and improper use of HTML/XHTML syntax will give you a big fat "F" when your site is evaluated by the search engines. Sure, their are content management systems that appear to be error free, also they are simple to use as they incorporate "what you see is what you get" interface editing software. The problem with these content management systems? They add an enormous amount of overhead to your webpages. Also, as enticing as it seems to add tons on multimedia, cool apps, and fun gadgets to your site, keep in mind, the more stuff you add to your pages, the longer they will take to be processed by browsers. Also, adding an enormous amount functionality to your site can open the door to enormous problems that come with that functionality. The last thing you need is 50 databases, and a million scripts interacting with your content. This presents more areas for things to go wrong. All being said, poor quality content, along with poor code = no traffic to your website and a poor ranking from major search engines.

So, lets get started!

Document Type Declaration - Doctype:

We briefly touched on document type declarations in a previous tutorial, but for the sake of clarity we need to review them here as well. A document type declaration declares to the browser what version of HTML you are using. This could be HTML, XHTML, XML, etc. Without it, your webpages will run in "quirks mode", which simply means the browser will have to decide how to render your code based on "best guesses", since it isn't being told exactly what it is trying to render.

If you plan to format your pages as XHTML, there are several formats to consider when specifying your doctype.
XHTML 1.0 Transitional:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Notice the "Transitional" statement in this declaration. We are stating that we are using XHTML 1 compliant code that is backward compatible with HTML 4.01. If you plan on adhering to the the rules of XHTML 1.0 Transitional, yet you plan on using some of the aspects of HTML 4.01, then this is the doctype you should be using in your pages.

XHTML 1.0 Strict:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
Notice the "Strict" statement in this declaration. We are stating that the we are presenting a webpage that adheres strictly to the XHTML 1.0 standard. This allows for less error. Unlike HTML 4.01 and earlier versions, XHTML is not as forgiving when proper syntax is not being used.

XHTML 1.0 Frameset:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Framset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
Notice the "Frameset" statement in this declaration. We are stating that we are presenting a webpage that adheres to the XHTML 1.0 Transitional standard, while allowing the legal use of frames in our webpages. If you plan on using frames, you should consider using this doctype, as it would be considered the only proper formatting with XHTML.

HTML 4.01:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
Notice the "Strict" statement in this declaration. We are stating that we are presenting a webpage that adheres to the HTML 4.01 Strict standard. HTML 4.01 Strict does not allow for the use of "Framesets" or deprecated (deprecated in this sense means 'no longer recommended and should not be used') tags.

HTML 4.01 Transitional:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
Notice the "Transitional" statement in this declaration. We are stating that we are presenting a webpage that adheres to the HTML 4.01 Transitional standard. This form of HTML does allow for the use of deprecated tags, but does not allow for the use of "framesets".

HTML 4.01 Frameset:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">
Notice the "Frameset" statement in this declaration. We are stating that we are presenting a webpage that adheres to the HTML 4.01 Frameset standard.

HTML 5:
<!DOCTYPE html> Seriously, this is the entire doctype for an HTML 5 document. The W3C states that this is NOT a standard! Go figure, HTML 5 has been fraught with problems all along.

On the next page we will talk about "media types" and "name spaces".

Continue...