Chapter 4. Coded Character Sets And Encodings in the World
22
EUC is stateless.
EUC can contain 4 CCS by using G0, G1, G2, and G3. Though there is no requirement that ASCII
is designated to G0, I don't know any EUC codeset in which ASCII is not designated to G0.
For EUC with G0 ASCII, all codes other than ASCII are encoded in 0x80 0xff and this is upward
compatible to ASCII.
Expressions for characters in G0, G1, G2, and G3 character sets are described below in binary:
G0: 0???????
G1: 1??????? [1??????? [. . . ]]
G2: SS2 1??????? [1??????? [. . . ]]
G3: SS3 1??????? [1??????? [. . . ]]
where SS2 is 0x8e and SS3 is 0x8f.
4.3.2 ISO 2022 compliant Character Sets
There are many national and international standards of coded character sets (CCS). Some of them
are ISO 2022 compliant and can be used in ISO 2022 encoding.
ISO 2022 compliant CCS are classified into one of them:
94 characters
96 characters
94x94x94x. . . characters
The most famous 94 character set is US ASCII. Also, all ISO 646 variants are ISO 2022 compliant
94 character sets.
All ISO 8859 * character sets are ISO 2022 compliant 96 character sets.
There are many 94x94 character sets. All of them are related to CJK ideograms.
JISX 0208 (aka JIS C 6226) National standard of Japan. 1978 version contains 6802 characters in
cluding Kanji (ideogram), Hiragana, Katakana, Latin, Greek, Cyrillic, numeric, and other
symbols. The current (1997) version contains 7102 characters.
footer
Our partners:
PHP: Hypertext Preprocessor Best Web Hosting
Java Web Hosting
Inexpensive Web Hosting
Jsp Web Hosting
Cheapest Web Hosting
Jsp Hosting
Cheap Hosting
Visionwebhosting.net Business web hosting division of Web
Design Plus. All rights reserved