Chapter 4. Coded Character Sets And Encodings in the World
31
Now Windows comes to support Unicode and the font at
u+005c
for Japanese version of Win
dows is yen currency mark. As you know, backslash (yen currency mark in Japan) is vitally im
portant for Windows, because it is used to separate directory names. Fortunately, EUC JP, which
is widely used for UNIX in Japan, includes ASCII, not Japanese version of ISO 646. So this is not
problem because it is clear
0x5c
is backslash.
Thus all local codesets should not use character sets incompatible to ASCII, such as ISO 646 *.
Problems and Solutions for Unicode and User/Vendor Defined Characters (
http://www.opengroup.
or.jp/jvc/cde/ucs conv e.html
) discusses on this problem.
4.5 Other Character Sets and Encodings
Besides ISO 2022 compliant coded character sets and encodings described in `ISO 2022 compliant
Character Sets' on page
22
and `ISO 2022 compliant Encodings' on page
24
, there are many popu
lar encodings which cannot be classified into an international standard (i.e., not ISO 2022 compliant
nor Unicode). Internationalized softwares should support these encodings (again, you don't need
to be aware of encodings if you use LOCALE and
wchar_t
technology). Some organizations
are developing systems which go father than limitations of the current international standards,
though these systems may be not diffused very much so far.
4.5.1 Big5
Big5 is a de facto standard encoding for Taiwan (1984) and is upward compatible with ASCII. It
is also a CCS.
In Big5,
0x21
0x7e
means ASCII characters.
0xa1
0xfe
makes a pair with the following byte
(
0x40
0x7e
and
0xa1
0xfe
) and means an ideogram and so on (13461 characters).
Though Taiwan has ISO 2022 compliant new standard CNS 11643, Big5 seems to be more popular
than CNS 11643. (CNS 11643 is a CCS and there are a few ISO 2022 derived encodings which
include CNS 11643.)
4.5.2 UHC
UHC is an encoding which is an upward compatible with EUC KR. Two byte characters (the first
byte:
0x81
0xfe
; the second byte:
0x41
0x5a
,
0x61
0x7a
, and
0x81
0xfe
) include KSX
1001 and other Hangul so that UHC can express all 11172 Hangul.
footer
Our partners:
PHP: Hypertext Preprocessor Best Web Hosting
Java Web Hosting
Inexpensive Web Hosting
Jsp Web Hosting
Cheapest Web Hosting
Jsp Hosting
Cheap Hosting
Visionwebhosting.net Business web hosting division of Web
Design Plus. All rights reserved