Web Development is infinitely more troublesome when you have documents in languages other than American English. The onus is on us web developers and server administrators to make sure browsers and search engines can detect the right language. Here is how you can declare the language of your document in HTML 5.
What is language declaration?
This is a way to specify what language a HTML document or a snippet of HTML text is in. Language declaration does not provide information on character encoding and the text direction (right to left or left to right). Those need to be declared separately.
Why specify a language?
Language information can be used for:
- Text to speech converters (e.g. speak Canadian french rather than french)
- Selecting the right fonts for display (e.g. use traditional chinese script instead of the simplified one)
- Selecting the right dictionary for browser spell-checks in forms (use UK English rather than US English)
- Rendering the page correctly — in short deliver the document in its most natural language as possible.
Language processing
In HTML 5, there are 3 ways to declare the language of a HTML document:
As a pragma directive e.g.(W3C’s HTML5 validator now reports the following error: “Using the meta element to specify the document-wide default language is obsolete. Consider specifying the language on the root element instead.”)<meta http-equiv="content-language" content="en">
As part of header in HTTP response, e.g. below:
HTTP/1.1 200 OK Date: Wed, 05 Nov 2003 10:46:04 GMT Server: Apache/1.3.28 (Unix) PHP/4.2.3 Content-Location: CSS2-REC.en.html Vary: negotiate,accept-language,accept-charset TCN: choice P3P: policyref=http://www.w3.org/2001/05/P3P/p3p.xml Cache-Control: max-age=21600 Expires: Wed, 05 Nov 2003 16:46:04 GMT Last-Modified: Tue, 12 May 1998 22:18:49 GMT ETag: "3558cac9;36f99e2b" Accept-Ranges: bytes Content-Length: 10734 Connection: close Content-Type: text/html; charset=iso-8859-1 Content-Language: en
Example from W3C article on Internationalization Best Practices- As
lang
attribute on a HTML element e.g.<div lang="fr">
, or axml:lang
attribute on XML documents like MathML and SVG.
The first two ways of specifying language is used to identify the intended audience of the HTML document. This information is used in the following ways:
- Search Engines use this for determining which document to include in search results (e.g. it will not show a document with content-language set as Chinese if a search is looking for english documents, but most search engines use more than these two to determine which documents to show).
- Content negotiation by Apache servers based on the language preference set by the users on their browsers.
- Identify the default language of a document This concept is new in HTML 5. If you specify only one language using the above two methods (i.e.
<html lang="en">
instead of<meta http-equiv="content-language" content="en, fr">
), then the text of the entire document is processed as that language (except for the text that is contained in an element which has anotherlang
attribute, which is processed as the language tag value inlang
attribute).
The last method is to explicitly declare a language to be used for text processing by the user agent. Use the lang
attribute if you want the browser to process the text in that HTML element in a specific language.
The language code that comes after content in Content-Language
ormeta http-equiv
or in lang
attribute need to be from subtags in the IANA language subtag registry. You can read more on choosing language values here
Default Language of a Document
Unless you explicitly use the lang attribute to define the language of the document, HTML 5 specifies the following inheritance rules to determine the language of a HTML element:
The HTML element has a lang
attribute (e.g. <span lang="en">
), if not —
The nearest parent of that element has a lang attribute, if not —
The document has a single language tag set through pragma directive (e.g. —
<meta http-equiv="content-language" content="en">
), if not
The HTTP header Content-Language contains a single language tag, if not —
The document is treated as that of an unknown language.
Bottomline
This is not the last word on detecting the language of a document, but for the time being, if your document has content that is mostly not English, use the lang
attribute on the <html>
element to specify the language. If there are elements of the document which use language other than the one specified for the whole document, use lang
attribute for each such element.