Chapter 10. the Internet
85
RFC 2045 (
http://www.faqs.org/rfcs/rfc2045.html
) and and RFC 2046 (
http://www.
faqs.org/rfcs/rfc2046.html
) determine the way to write non ASCII characters in the main
text of mail. On the other hand, RFC 2047 (
http://www.faqs.org/rfcs/rfc2045.html
)
describes 'encoded words' which is the way to write non ASCII characters in the header. It is like
that:
=?
encoding
?
conversion algorithm
?
data
?=
, where encoding is selected from the list of
charset
of
Content Type
header, algorithm is
Q
or
q
for quoted printable or
B
or
b
for base64, and data
is encoded data whose length is less than 76 bytes. If the data is longer than 75 bytes, it must be
divided into multiple encoded words. For example,
Subject: =?ISO 2022 JP?B?GyRCNEE7eiROJTUlViU4JSclLyVIGyhC?=
reads 'a subject written in Kanji' in Japanese (ISO 2022 JP, encoded by base64). Of course human
cannot read it.
10.2 WWW
WWW is a system that HTML documents (mainly; and files in other formats) are transferred using
HTTP protocol.
HTTP protocol is defined by RFC 2068 (
http://www.faqs.org/rfcs/rfc2068.html
). HTTP
uses headers like mails and
Content Type
header is used to describe the type of the contents.
Though
charset
parameter can be described in the header, it is rarely used.
RFC 1866 (
http://www.faqs.org/rfcs/rfc1866.html
) describes that the default encoding
for HTML is ISO 8859 1. However, many web pages are written in, for example, Japanese and
Korean using (of course) encodings different from ISO 8859 1. Sometimes the HTML document
describes:
which declares that the page is written in ISO 2022 JP. However, there many pages without any
declaration of encoding.
Web browsers have to deal with such a circumstance. Of course web browsers have to be able to
deal with every encodings in the world which is listed in MIME. However, many web browsers
can only deal with ASCII or ISO 8859 1. Such web browsers are useless at all for non ASCII or
non ISO 8859 1 people.
URL should be written in ASCII character, though non ASCII characters can be expressed using
%
nn sequence where nn is hexadecimal value. This is because there are no way to specify encoding.
Wester European people would treat it as ISO 8859 1, while Japanese people would treat it as
EUC JP or SHIFT JIS.
footer
Our partners:
PHP: Hypertext Preprocessor Best Web Hosting
Java Web Hosting
Inexpensive Web Hosting
Jsp Web Hosting
Cheapest Web Hosting
Jsp Hosting
Cheap Hosting
Visionwebhosting.net Business web hosting division of Web
Design Plus. All rights reserved