Chapter 3. Important Concepts for Character Coding Systems
13
3.3 Multibyte encodings
Encodings are classified into multibyte ones and the others, according to the relationship between
number of characters and number of bytes in the encoding.
In non multibyte encoding, one character is always expressed by one byte. On the other hand,
one character may expressed in one or more bytes in multibyte encoding. Note that the number is
not fixed even in a single encoding.
Examples of multibyte encodings are: EUC JP, EUC KR, ISO 2022 JP, Shift JIS, Big5, UHC, UTF 8,
and so on. Note that all of UTF * are multibyte.
Examples of non multibyte encodings are: ISO 8859 1, ISO 8859 2, TIS 620, VISCII, and so on.
Note that even in non multibyte encoding, number of characters and number of bytes may differ
if the encoding is stateful.
Ken Lunde's CJKV Information Processing
3
classifies encoding methods into the following
three categories:
modal
non modal
fixed length
Modal corresponds to stateful in this document. Other two are stateless, where non modal is multibyte
and fixed length is non multibyte. However, I think stateful stateless and multibyte non multibyte
are independent concept.
4
3.4 Number of Bytes, Number of Characters, and Number of Columns
One ASCII character is always expressed by one byte and occupies one column on console or X
terminal emulators (fixed font for X). One must not make such an assumption for I18N program
ming and have to clearly distinguish number of bytes, characters, and columns.
Speaking of relationship between characters and bytes, in multibyte encodings, two or more bytes
may be needed to express one character. In stateful encodings, escape sequences are not related to
any characters.
Number of columns is not defined in any standards. However, it is usual that CJK ideograms,
Japanese Hiragana and Katakana, and Korean Hangul occupy two columns in console or X termi
nal emulators. Note that 'Full width forms' in UCS 2 and UCS 4 coded character set will occupy
3
ISBN 1 56592 224 7, O'Reilly, 1999
4
though there are no existing encodings which is stateful and non multibyte.
footer
Our partners:
PHP: Hypertext Preprocessor Best Web Hosting
Java Web Hosting
Inexpensive Web Hosting
Jsp Web Hosting
Cheapest Web Hosting
Jsp Hosting
Cheap Hosting
Visionwebhosting.net Business web hosting division of Web
Design Plus. All rights reserved