Chapter 4. Coded Character Sets And Encodings in the World
28
4.4.3 Problems on Unicode
All standards are not free from politics and compromise. Though a concept of united single CCS
for all characters in the world is very nice, Unicode had to consider compatibility with preceding
international and local standards. And more, unlike the ideal concept, Unicode people considered
efficiency too much. IMHO, surrogate pair is a mess caused by lack of 16bit code space. I will
introduce a few problems on Unicode.
Han Unification
This is the point on which Unicode is criticized most strongly among many Japanese people.
A region of 0x4e00 0x9fff in UCS 2 is used for Eastern Asian ideographs (Japanese Kanji, Chinese
Hanzi, and Korean Hanja). There are similar characters in these four character sets. (There are
two sets of Chinese characters, simplified Chinese used in P. R. China and traditional Chinese
used in Taiwan). To reduce the number of these ideograms to be encoded (the region for these
characters can contain only 20992 characters while only Taiwan CNS 11643 standard contains
48711 characters), these similar characters are assumed to be the same. This is Han Unification.
However these characters are not exactly the same. If fonts for these characters are made from
Chinese one, Japanese people will regard them wrong characters, though they may be able to
read. Unicode people think these united characters are the same character with different glyphs.
An example of Han Unification is available at U+9AA8 (
http://www.unicode.org/cgi bin/
GetUnihanData.pl?codepoint=9AA8
). This is a Kanji character for 'bone'. U+8FCE (
http:
//www.unicode.org/cgi bin/GetUnihanData.pl?codepoint=8FCE
) is an another exam
ple of a Kanji character for 'welcome'. The part from left side to bottom side is 'run' radical. 'Run'
radical is used for many Kanjis and all of them have the same problem. U+76F4 (
http://www.
unicode.org/cgi bin/GetUnihanData.pl?codepoint=76F4
) is an another example of a
Kanji character for 'straight'. I, a native Japanese speaker, cannot recognize Chiense version at all.
Unicode font vendors will hesitate to choose fonts for these characters, simplified Chinese char
acter, traditional Chinese one, Japanese one, or Korean one. One method is to supply four fonts
of simplified Chinese version, traditional Chinese version, Japanese version, and Korean version.
Commercial OS vendor can release localized version of their OS for example, Japanese version
of MS Windows can include Japanese version of Unicode font (this is what they are exactly doing).
However, how should XFree86 or Debian do? I don't know. . .
7 8
7
XFree86 4.0 includes Japanese and Korean versions of ISO 10646 1 fonts.
8
I heard that Chinese and Korean people don't mind the glyph of these characters. If this is always true, Japanese
glyphs should be the default glyphs for these problematic characters for international systems such as Debian.
footer
Our partners:
PHP: Hypertext Preprocessor Best Web Hosting
Java Web Hosting
Inexpensive Web Hosting
Jsp Web Hosting
Cheapest Web Hosting
Jsp Hosting
Cheap Hosting
Visionwebhosting.net Business web hosting division of Web
Design Plus. All rights reserved