Character encodings in HTML ▪ Sale

HTML (Hypertext Markup Language) has been in use since 1991, but HTML 4.0 (December 1997) was the first standardized version where international characters were given reasonably complete treatment. When an HTML document includes special characters outside the range of seven-bit ASCII two goals are worth considering: the information's integrity, and universal browser display.

Specifying the document's character encoding [edit]

There are several ways to specify which character encoding is used in the document. First, the web server can include the character encoding or "charset" in the Hypertext Transfer Protocol (HTTP) Content-Type header, which would typically look like this:

Content-Type: text/html; charset=ISO-8859-1

This method gives the HTTP server a convenient way to alter document's encoding according to content negotiation; certain HTTP server software can do it, for example Apache with the module mod_charset_lite.

For HTML it is possible to include this information inside the head element near the top of the document:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

HTML5 also allows the following syntax to mean exactly the same:

<meta charset="utf-8">

XHTML documents have a third option: to express the character encoding via XML declaration, as follows:

<?xml version="1.0" encoding="ISO-8859-1"?>

<meta http-equiv="Content-Type"> may be interpreted directly by a browser, like an ordinary HTML tag, or it may be used by the HTTP server to generate corresponding headers when it serves the document. The HTTP/1.1 header specification for a HTML document must label an appropriate encoding in the Content-Type header, missing charset= parameter results in acceptance of ISO-8859-1 (so HTTP/1.1 formally does not offer such option as an unspecified character encoding), and this specification supersedes all HTML (or XHTML) meta element ones. This can pose a problem if the server generates an incorrect header and one does not have the access or the knowledge to change them.

As each of these methods explain to the receiver how the file being sent should be interpreted, it would be inappropriate for these declarations not to match the actual character encoding used. Because a server usually does not know how a document is encoded-especially if documents are created on different platforms or in different regions-many servers simply do not include a reference to the "charset" in the Content-Type header, thus avoiding making false promises. However, if the document does not specify the encoding either, this may result in the equally bad situation where the user agent displays mojibake because it cannot find out which character encoding was used. Due to widespread and persistent ignorance of HTTP charset= over the Internet (at its server side), WWW Consortium disappointed in HTTP/1.1’s strict approach and encourages browser developers to use some fixes in violation of RFC 2616.

If a user agent reads a document with no character encoding information, it can fall back to using some other information. For example, it can rely on the user's settings, either browser-wide or specific for a given document, or it can pick a default encoding based on the user's language. For Western European languages, it is typical and fairly safe to assume Windows-1252, which is similar to ISO-8859-1 but has printable characters in place of some control codes. The consequence of choosing incorrectly is that characters outside the printable ASCII range (32 to 126) usually appear incorrectly. This presents few problems for English-speaking users, but other languages regularly-in some cases, always-require characters outside that range. In CJK environments where there are several different multi-byte encodings in use, auto-detection is also often employed. Finally, browsers usually permit the user to override incorrect charset label manually as well.

It is increasingly common for multilingual websites and websites in non-Western languages to use UTF-8, which allows use of the same encoding for all languages. UTF-16 or UTF-32, which can be used for all languages as well, are less widely used because they can be harder to handle in programming languages that assume a byte-oriented ASCII superset encoding, and they are less efficient for text with a high frequency of ASCII characters, which is usually the case for HTML documents.

Successful viewing of a page is not necessarily an indication that its encoding is specified correctly. If the page's creator and reader are both assuming some platform-specific character encoding, and the server does not send any identifying information, then the reader will nonetheless see the page as the creator intended, but other readers on different platforms or with different native languages will not see the page as intended.

Character references [edit]

In addition to native character encodings, characters can also be encoded as character references, which can be numeric character references (decimal or hexadecimal) or character entity references. Character entity references are also sometimes referred to as named entities, or HTML entities for HTML. HTML's usage of character references derives from SGML.

HTML character references [edit]

A numeric character reference in HTML refers to a character by its Universal Character Set/Unicode code point, and uses the format

&#nnnn;

or

&#xhhhh;

where nnnn is the code point in decimal form, and hhhh is the code point in hexadecimal form. The x must be lowercase in XML documents. The nnnn or hhhh may be any number of digits and may include leading zeros. The hhhh may mix uppercase and lowercase, though uppercase is the usual style.

Not all web browsers or email clients used by receivers of HTML documents, or text editors used by authors of HTML documents, will be able to render all HTML characters. Most modern software is able to display most or all of the characters for the user's language, and will draw a box or other clear indicator for characters they cannot render.

For codes from 0 to 127, the original 7-bit ASCII standard set, most of these characters can be used without a character reference. Codes from 160 to 255 can all be created using character entity names. Only a few higher-numbered codes can be created using entity names, but all can be created by decimal number character reference.

Character entity references can also have the format &name; where name is a case-sensitive alphanumeric string. For example, "λ" can also be encoded as &lambda; in an HTML document. The character entity references &lt;, &gt;, &quot; and &amp; are predefined in HTML and SGML, because <, >, " and & are already used to delimit markup. This notably does not include XML's &apos; (') entity. For a list of all named HTML character entity references (about 250), see List of XML and HTML character entity references.

Unnecessary use of HTML character references may significantly reduce HTML readability. If the character encoding for a web page is chosen appropriately, then HTML character references are usually only required for markup delimiting characters as mentioned above, and for a few special characters (or none at all if a native Unicode encoding like UTF-8 is used). Incorrect HTML entity escaping may also open up security vulnerabilities for injection attacks such as cross-site scripting. If HTML attributes are left unquoted, certain characters, most importantly whitespace, such as space and tab, must be escaped using entities. Other languages related to HTML have their own methods of escaping characters.

Illegal characters [edit]

HTML forbids the use of the characters with Universal Character Set/Unicode code points

These characters are not even allowed by reference. That is, you should not even write them as numeric character references. However, references to characters 128–159 are commonly interpreted by lenient web browsers as if they were references to the characters assigned to bytes 128–159 (decimal) in the Windows-1252 character encoding. This is in violation of HTML and SGML standards, and the characters are already assigned to higher code points, so HTML document authors should always use the higher code points. For example, for the trademark sign (™), use &#8482;, not &#153;.

The characters 9 (tab), 10 (linefeed), and 13 (carriage return) are allowed in HTML documents, but, along with 32 (space) are all considered "whitespace". The "form feed" control character, which would be at 12, is not allowed in HTML documents, but is also mentioned as being one of the "white space" characters - perhaps an oversight in the specifications. In HTML, most consecutive occurrences of white space characters, except in a <pre> block, are interpreted as comprising a single "word separator" for rendering purposes. A word separator is typically rendered a single en-width space in European languages, but not in others.

XML character references [edit]

Unlike traditional HTML with its large range of character entity references, in XML there are only five predefined character entity references. These are used to escape characters that are markup sensitive in certain contexts:

All other character entity references have to be defined before they can be used. For example, use of &eacute; (which gives é, Latin lower-case E with acute accent, U+00E9 in Unicode) in an XML document will generate an error unless the entity has already been defined. XML also requires that the x in hexadecimal numeric references be in lowercase: for example &#xA1b rather than &#XA1b. XHTML, which is an XML application, supports the HTML entity set, along with XML's predefined entities.

However, use of &apos; in XHTML should generally be avoided for compatibility reasons. &#39; or &#x0027; may be used instead.

&amp; has the special problem that it starts with the character to be escaped. A simple Internet search finds millions of sequences &amp;amp;amp;amp; ... in HTML pages for which the algorithm to replace an ampersand by the corresponding character entity reference was probably applied repeatedly http://www.google.co.uk/search?q=%22%26amp%3Bamp%3Bamp%3Bamp%3B%22  Missing or empty |title= (help).

See also [edit]

References [edit]

  1. Fielding, R.; Gettys, J.; Mogul, J.; Frystyk, H.; Masinter, L.; Leach, P.; Berners-Lee, T. (June 1999), "Content-Type", Hypertext Transfer Protocol – HTTP/1.1, IETF, retrieved 8 March 2010 
  2. Apache Module mod_charset_lite
  3. Hickson, I. (5 March 2010), "Specifying the document's character encoding", HTML5, WHATWG, retrieved 8 March 2010 
  4. Bray, T.; Paoli, J.; Sperberg-McQueen, C.; Maler, E.; Yergeau, F. (26 November 2008), "Prolog and Document Type Declaration", XML, W3C, retrieved 8 March 2010 
  5. The global structure of an HTML document: The META element
  6. RFC 2616 3.7.1 Canonicalization and Text Defaults
  7. HTML 4, HTML Document Representation: Specifying the character encoding
  8. http://www.w3.org/TR/REC-html40/sgml/sgmldecl.html
  9. http://www.w3.org/TR/REC-html40/struct/text.html#h-9.1
  10. Bray, T.; Paoli, J.; Sperberg-McQueen, C.; Maler, E.; Yergeau, F. (26 November 2008), "Character and Entity References", XML, W3C, retrieved 8 March 2010 

External links [edit]

Popular search requests

Character encodings in HTML is an object of interest for many people. For example, the people often search for Character encodings in HTML website, Character encodings in HTML blog, Character encodings in HTML online, Character encodings in HTML information, Character encodings in HTML photo, Character encodings in HTML picture, Character encodings in HTML video, Character encodings in HTML movie, Character encodings in HTML history, Character encodings in HTML news, Character encodings in HTML facts, Character encodings in HTML description, Character encodings in HTML detailed info, Character encodings in HTML features, Character encodings in HTML manual, Character encodings in HTML instructions, Character encodings in HTML comparison, Character encodings in HTML book, Character encodings in HTML story, Character encodings in HTML article, Character encodings in HTML review, Character encodings in HTML feedbacks, Character encodings in HTML selection, Character encodings in HTML data, Character encodings in HTML address, Character encodings in HTML phone number, download Character encodings in HTML, Character encodings in HTML reference, Character encodings in HTML wikipedia, Character encodings in HTML facebook, Character encodings in HTML twitter, Character encodings in HTML 2013, Character encodings in HTML 2014, Character encodings in HTML in the United States, Character encodings in HTML USA, Character encodings in HTML US, Character encodings in HTML in United Kingdom, Character encodings in HTML UK, Character encodings in HTML in Canada, Character encodings in HTML in Australia, etc.

Character encodings in HTML is also an object of commercial interest. For example, many people are interested in Character encodings in HTML offers, Character encodings in HTML buy, Character encodings in HTML sell, Character encodings in HTML sale, Character encodings in HTML discounts, discounted Character encodings in HTML, Character encodings in HTML coupon, Character encodings in HTML promo code, Character encodings in HTML order, to order Character encodings in HTML online, to buy Character encodings in HTML, how much for Character encodings in HTML, Character encodings in HTML price, Character encodings in HTML cost, Character encodings in HTML price list, Character encodings in HTML tariffs, Character encodings in HTML rates, Character encodings in HTML prices, Character encodings in HTML delivery, Character encodings in HTML store, Character encodings in HTML online store, Character encodings in HTML online shop, inexpensive Character encodings in HTML, cheap Character encodings in HTML, Character encodings in HTML for free, free Character encodings in HTML, used Character encodings in HTML, and so on.

Information source: wikipedia.org

Do you want to know more? Look at the full version of the Character encodings in HTML article.

HOT DESIGNS
Premium designs
Designs by country
Designs by U.S. state
Most popular designs
Newest, last added designs
Unique designs
Cheap, budget designs
Design super sale

DESIGNS BY THEME
Accounting, audit designs
Adult, sex designs
African designs
American, U.S. designs
Animals, birds, pets designs
Agricultural, farming designs
Architecture, building designs
Army, navy, military designs
Audio & video designs
Automobiles, car designs
Books, e-book designs
Beauty salon, SPA designs
Black, dark designs
Business, corporate designs
Charity, donation designs
Cinema, movie, film designs
Computer, hardware designs
Celebrity, star fan designs
Children, family designs
Christmas, New Year's designs
Green, St. Patrick designs
Dating, matchmaking designs
Design studio, creative designs
Educational, student designs
Electronics designs
Entertainment, fun designs
Fashion, wear designs
Finance, financial designs
Fishing & hunting designs
Flowers, floral shop designs
Food, nutrition designs
Football, soccer designs
Gambling, casino designs
Games, gaming designs
Gifts, gift designs
Halloween, carnival designs
Hotel, resort designs
Industry, industrial designs
Insurance, insurer designs
Interior, furniture designs
International designs
Internet technology designs
Jewelry, jewellery designs
Job & employment designs
Landscaping, garden designs
Law, juridical, legal designs
Love, romantic designs
Marketing designs
Media, radio, TV designs
Medicine, health care designs
Mortgage, loan designs
Music, musical designs
Night club, dancing designs
Photography, photo designs
Personal, individual designs
Politics, political designs
Real estate, realty designs
Religious, church designs
Restaurant, cafe designs
Retirement, pension designs
Science, scientific designs
Sea, ocean, river designs
Security, protection designs
Social, cultural designs
Spirit, meditational designs
Software designs
Sports, sporting designs
Telecommunication designs
Travel, vacation designs
Transport, logistic designs
Web hosting designs
Wedding, marriage designs
White, light designs

E-COMMERCE DESIGNS
Magento store designs
OpenCart store designs
PrestaShop store designs
CRE Loaded store designs
Jigoshop store designs
VirtueMart store designs
osCommerce store designs
Zen Cart store designs

CMS DESIGNS
Flash CMS designs
Joomla CMS designs
Mambo CMS designs
Drupal CMS designs
WordPress blog designs
Forum designs
phpBB forum designs
PHP-Nuke portal designs

ANIMATED WEBSITE DESIGNS
Flash CMS designs
Silverlight animated designs
Silverlight intro designs
Flash animated designs
Flash intro designs
XML Flash designs
Flash 8 animated designs
Dynamic Flash designs
Flash animated photo albums
Dynamic Swish designs
Swish animated designs
jQuery animated designs

WEBSITE DESIGNS
WebMatrix Razor designs
HTML 5 designs
Web 2.0 designs
3-color variation designs
3D, three-dimensional designs
Artwork, illustrated designs
Clean, simple designs
CSS based website designs
Full design packages
Full ready websites
Portal designs
Stretched, full screen designs
Universal, neutral designs

CORPORATE ID DESIGNS
Corporate identity sets
Logo layouts, logo designs
Logotype sets, logo packs
PowerPoint, PTT designs
Facebook themes

VIDEO, SOUND & MUSIC
Video e-cards
After Effects video intros
Special video effects
Music tracks, music loops
Stock music bank

GRAPHICS & CLIPART
Pro clipart & illustrations, $19/year
5,000+ icons by subscription
Icons, pictograms

 
Character encodings in HTML Sale - Buy now!
Super Offers
Super Offers
Custom Logo Design $149  ▪  Web Programming  ▪  ID Card Printing  ▪  Best Web Hosting  ▪  eCommerce Software  ▪  Add Your Link
© 1996-2013 MAGIA Internet StudioAboutPortfolioPhoto on DemandHostingAdvertiseSitemapPrivacyMaria Online