XHTML

Overview

XHTML is based upon HTML 4. The way XHTML is usually presented is to teach HTML 4.01 and then discuss the differences. This is also the way the defining documents for XHTML at the World Wide Web Consortium are presented. The HTML 4.01 specification is at http://www.w3.org/TR/html401/. The XHTML specification is at http://www.w3.org/TR/2001/WD-xhtml1-20011004/. This page is primarily based on those references.

In the beginning was SGML (Standard Generalized Markup Language). It is an extremely flexible and powerful standard. It is also very difficult to use. HTML is based on SGML. It is, in fact, an SGML application, but HTML was designed to be much easier to use. The popular browsers have confused matters by adding their own unique tags and attributes and by accepting illegal HTML code.

XML (Extensible Markup Language) was also derived from SGML. It is a restricted form of SGML which retains much of its power, but is considerably easier to use. XML is very useful for separating content from presentation (which HTML was originally supposed to do). XML also makes the transfer of data between programs and computers easier. It is also easy to extend in a standard fashion. This makes XML ideally suited for use in data transfer, and storage of content on servers. Unfortunately, it is a big jump to move HTML-based browsers directly to XML.

XHTML is the bridge between HTML and XML. We can convert most HTML pages to XHTML without too much effort. Since XHTML is an application of XML, the converted pages will be in a form of XML.

Conformance

The popular browsers allow authors to write extremely sloppy (invalid) HTML documents and still get a reasonable result. XHTML documents are required to follow certain rules to be considered in strict conformance with XHTML:

It is also recommended that the document start with an XML declaration stating what XML version and character encoding are being used.

XML Declaration

<?xml version="1.0" encoding="UTF-8"?>

This declaration comes at the very beginning of an XML (and therefore XHTML) document. It is not required and default values will be assumed. If you are using any character encoding other than UTF-8 or UTF-16, then you will have to include this declaration with the appropriate settings.

DOCTYPE Declarations

After the XML declaration (if included) is the DOCTYPE statement. This tells the user agent what dialect of XML is being spoken by specifying its DTD (Document Type Definition). XHTML documents are said to be well-formed if they follow the structural rules of XML. They are valid only when they are well-formed AND follow the rules set forth in their DTD (Document Type Definition).

DOCTYPE statements are usually ignored for HTML, but may be included to give an indication of what "standard" the author is trying to conform to. They are required for XHTML. Listed below are a number of DOCTYPE statements for different versions of HTML and XHTML.

HTML 3.2

<!DOCTYPE HTML PUBLIC ".//IETF//DTD HTML 3.2//EN">

HTML 4.01:

The strict DTD is for documents which do not include any deprecated tags, deprecated attributes, or framesets.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
     "http://www.w3.org/TR/html4/strict.dtd">
The transitional DTD is for documents which may include deprecated tags or deprecated attributes, but no framesets.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
     "http://www.w3.org/TR/html4/loose.dtd">
The frameset DTD is the same as the transitional DTD, but it includes framesets.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN"
     "http://www.w3.org/TR/html4/frameset.dtd">

XHTML 1.0:

The strict DTD is for documents which do not include any deprecated tags, deprecated attributes, or framesets.
<!DOCTYPE html 
     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
The transitional DTD is for documents which may include deprecated tags or deprecated attributes, but no framesets.
<!DOCTYPE html 
     PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
The frameset DTD is the same as the transitional DTD, but it includes framesets.
<!DOCTYPE html 
     PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

Namespace

Part of XML which we are not going to discuss at this time concerns namespaces. The important part for us is that the namespace should be specified in the root element.

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

XHTML Document Template

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html 
     PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>XHTML Document Template</title>
</head>
<body>
<p>Content of document</p>
</body>
</html>

Differences From HTML 4.01

Exclusion Rules

Conversion/Validation

There are a number of available utilities to convert HTML code to later versions, tidy up existing code, or just validate a document to see if it conforms to a specific standard. The HTML Tidy program is available in a number of different versions.

XHTML Exercise

XHTML exercise opens in new window.

Previous: HTML Software

Next: Information Architecture