Chapter 4. Coded Character Sets And Encodings in the World
25
4.4 ISO 10646 and Unicode
ISO 10646 and Unicode are an another standard so that we can develop international softwares
easily. The special features of this new standard are:
A united single CCS which intends to include all characters in the world. (ISO 2022 consists
of multiple CCS.)
The character set intends to cover all conventional (or legacy) CCS in the world.
3
Compatibility with ASCII and ISO 8859 1 is considered.
Chinese, Japanese, and Korean ideograms are united. This comes from a limitation of Uni
code. This is not a merit.
ISO 10646 is an official international standard. Unicode is developed by Unicode Consortium
(
http://www.unicode.org
). These two are almost identical. Indeed, these two are exactly
identical at code points which are available in both two standards. Unicode is sometimes updated
and the newest version is 3.0.1.
4.4.1 UCS as a Coded Character Set
ISO 10646 defines two CCS (coded character sets), UCS 2 and UCS 4. UCS 2 is a subset of UCS 4.
UCS 4 is a 31bit CCS. These 31 bits are divided into 7, 8, 8, and 8 bits and each of them has special
term.
The top 7 bits are called Group.
Next 8 bits are called Plane.
Next 8 bits are Row.
The smallest 8 bits are Cell.
The first plane (Group = 0, Plane = 0) is called BMP (Basic Multilingual Plane) and UCS 2 is same
to BMP. Thus, UCS 2 is a 16bit CCS.
Code points in UCS are often expressed as u+
????
, where
????
is hexadecimal expression of the
code point.
3
This is obviously not true for CNS 11643 because CNS 11643 contains 48711 characters while Unicode 3.0.1 contains
49194 characters, only 483 excess than CNS 11643.
footer
Our partners:
PHP: Hypertext Preprocessor Best Web Hosting
Java Web Hosting
Inexpensive Web Hosting
Jsp Web Hosting
Cheapest Web Hosting
Jsp Hosting
Cheap Hosting
Visionwebhosting.net Business web hosting division of Web
Design Plus. All rights reserved