HTML 4.1: The HTML element


<html>
<head>
<title>Untitled Document</title>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>

<body>
</body>

</html>


The HTML 4.0/4.1 Specification: http://www.w3.org/TR/REC-html40/struct/global.html#edef-HTML

Example:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

<HTML lang="en-US">

<!-- Info on the Doctype Declaration -->

The language attribute

See ISO 639-1 and 639-2 : "Codes for the Representation of Names of Languages"
(aka "ISO Language Codes")
<!-- http://www.loc.gov/standards/iso639-2/englangn.html -->

- Why "en" for English instead of "eng", as used in ISO 639-2?

Per RFC 3066 :
"When a language has both an ISO 639-1 2-character code and an
ISO 639-2 3-character code, you MUST use the tag derived from the
ISO 639-1 2-character code."

The language subcode (in this example, "US" for "English as spoken in the USA")

There are two choices for the language subcode: the country code or a "subcode".
The country codes are described in ISO 3166.
See also: ISO 639, RFC 3066.

Use a subtag with the 2-character language code as specified in RFC 3066.
Subtags are registered with the Internet Assigned Numbers Authority (IANA).

The alternative to country code is the descriptive subcode. For example:

zh-cantonese
zh-mandarin

instead of
zh-CN (as spoken in China)
zh-TW (as spoken in Taiwan)

See http://www.loc.gov/standards/iso639-2/faq.html#21

Unofficial Guidelines:

- A declaration in the <html> tag should indicate the language used to write the .htm/.html file, not the language used on the redered page (the "content").
For page content, use the <meta http-equiv="Content-Language" content="language"> tag.

- Servers and Browsers should assume 'lang=en-US' unless specified otherwise. Other common HTML 4 options:

<html lang="ja"> (Japanese)
<html lang="de"> (German)
<html lang="en-GB"> (British English; English-Great Britian) [note: en-UK is not valid]

- Note the language use in the DOCTYPE declaration is the language the DTD was written in. Leave it alone, the DTD was written in EN.


Note addition of Content Types (HTTP Equivalent):

Content-type: text/plain; charset=iso-8859-10
Content-Language: en, fr

For example:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 //EN" "http://www.w3.org/TR/html4/strict.dtd">;
<html lang="de">
<head>
<meta http-equiv="Content-Language" content="DE">
. . .

In XHTML 1.0:

A document that wants to set its character encoding explicitly must include both the XML declaration as an encoding declaration and as a 'meta http-equiv' statement:

<?xml version="1.0" encoding="EUC-JP"?>
<meta http-equiv="Content-type" content="text/html; charset=EUC-JP" />

Use both the lang and xml:lang attributes when specifying the language of an element. The value of the xml:lang attribute takes precedence. E.g.

<FreeText lang="en" xml:lang="en">Hello world!!</FreeText>
<FreeText lang="ja" xml:lang="ja">三省堂</FreeText>

First Post: 2005 Feb 15. Updates: none