3
C H A P T E R
Lexical Structure
Lexicographer: A writer of dictionaries, a harmless drudge.
Samuel Johnson,
Dictionary
(1755)
T
HIS chapter specifies the lexical structure of Java.
Java programs are written in Unicode ( 3.1), but lexical translations are pro
vided ( 3.2) so that Unicode escapes ( 3.3) can be used to include any Unicode
character using only ASCII characters. Line terminators are defined ( 3.4) to sup
port the different conventions of existing host systems while maintaining consis
tent line numbers.
The Unicode characters resulting from the lexical translations are reduced to a
sequence of input elements ( 3.5), which are white space ( 3.6), comments
( 3.7), and tokens. The tokens are the identifiers ( 3.8), keywords ( 3.9), literals
( 3.10), separators ( 3.11), and operators ( 3.12) of the Java syntactic grammar.
3.1 Unicode
Java programs are written using the Unicode character set, version 2.0. Informa
tion about this encoding may be found at:
http://www.unicode.org
and
ftp://unicode.org
Versions of Java prior to 1.1 used Unicode version 1.1.5 (see
The Unicode Stan
dard: Worldwide Character Encoding
( 1.2) and updates). See 20.5 for a discus
sion of the differences between Unicode version 1.1.5 and Unicode version 2.0.
Except for comments ( 3.7), identifiers, and the contents of character and
string literals ( 3.10.4, 3.10.5), all input elements ( 3.5) in a Java program are
formed only from ASCII characters (or Unicode escapes ( 3.3) which result in
ASCII characters). ASCII (ANSI X3.4) is the American Standard Code for Infor
mation Interchange. The first 128 characters of the Unicode character encoding
are the ASCII characters.
11
footer
Our partners:
PHP: Hypertext Preprocessor Best Web Hosting
Java Web Hosting
Inexpensive Web Hosting
Jsp Web Hosting
Cheapest Web Hosting
Jsp Hosting
Cheap Hosting
Visionwebhosting.net Business web hosting division of Web
Design Plus. All rights reserved