Chapter 3. Important Concepts for Character Coding Systems
11
code can call encoding and can call coded character set. Thus this word can be used only in the case
when both of them can be regard in the same category. This word should be avoided in serious
discussions. This document will not use this word hereafter.
Codeset is a word to call encoding or character encoding scheme.
1
charset is also a well used word. This word is used very widely, for example, in MIME (like
Content Type:
text/plain, charset=iso8859 1
), in XLFD (X Logical Font Descrip
tion) font name (CharSetResigtry and CharSetEncoding fields), and so on. Note that charset in
MIME is encoding, while charset in XLFD font name is coded character set. This is very confusing.
In this document, charset and character set are used in XLFD meaning, since I think character set
should mean a set of characters, not encoding.
Ken Lunde's CJKV Information Processing uses a word encoding method. He says that ISO
2022, EUC, Big5, and Shift JIS are examples of encoding methods. It seems that his encoding method
is CES in this document. However, we should notice that Big5 and Shift JIS are encodings while
ISO 2022 and EUC are not.
2
Character Encoding Model, Unicode Technical Report #17 (
http://www.unicode.org/unicode/
reports/tr17/
) (hereafter, the Report ) suggests five level model.
ACR: abstract character repertoire
CCS: Coded Character Set
CEF: Character Encoding Form
CES: Character Encoding Scheme
TES: Transfer Encoding Syntax
TES is also suggested in RFC 2130 (
http://www.faqs.org/rfcs/rfc2130.html
). Some ex
amples of TES are: base64, uuencode, BinHex, quoted printable, gzip, and so on. TES means a trans
form of encoded data which may (or may not) include textual data. Thus, TES is not a part of
character encoding. However, TES is important in the Internet data exchange.
When using a computer, we rarely have a chance to face with ACR. Though it is true that CJK
people have their national standard of ACR (for example, standard for ideograms which can be
used for personal names) and some of us may need to handle these ACR with computers (for
1
This document used a word codeset before Novermber 2000 to call encoding. I changed terminology since I could
not find a word codeset in documents written in English (I adopted this word from a book in Japanese). encoding seems
more popular.
2
During I18N programming, we will frequently meet with EUC JP or EUC KR, while we well rarely meet with
EUC. I think it is not appropriate to stress EUC, a class of encodings, over EUC JP, EUC KR, and so on, concrete
encodings. It is just like regarding ISO 8859 as a concrete encoding, though ISO 8859 is a class of encodings of ISO
8859 {1,2,. . . ,15}.
footer
Our partners:
PHP: Hypertext Preprocessor Best Web Hosting
Java Web Hosting
Inexpensive Web Hosting
Jsp Web Hosting
Cheapest Web Hosting
Jsp Hosting
Cheap Hosting
Visionwebhosting.net Business web hosting division of Web
Design Plus. All rights reserved