Java Hosting - Java Website Hosting - Java Language Specification Guide

C H A P T E R

Lexical Structure

Lexicographer: A writer of dictionaries, a harmless drudge.

 Samuel Johnson,

Dictionary

 (1755)

HIS chapter specifies the lexical structure of Java.

Java programs are written in Unicode ( 3.1), but lexical translations are pro

vided ( 3.2) so that Unicode escapes ( 3.3) can be used to include any Unicode

character using only ASCII characters. Line terminators are defined ( 3.4) to sup

port the different conventions of existing host systems while maintaining consis

tent line numbers.

The Unicode characters resulting from the lexical translations are reduced to a

sequence of input elements ( 3.5), which are white space ( 3.6), comments

( 3.7), and tokens. The tokens are the identifiers ( 3.8), keywords ( 3.9), literals

( 3.10), separators ( 3.11), and operators ( 3.12) of the Java syntactic grammar.

3.1   Unicode

Java programs are written using the Unicode character set, version 2.0. Informa

tion about this encoding may be found at:

http://www.unicode.org

and

 ftp://unicode.org

Versions of Java prior to 1.1 used Unicode version 1.1.5 (see

The Unicode Stan

dard: Worldwide Character Encoding

( 1.2) and updates). See  20.5 for a discus

sion of the differences between Unicode version 1.1.5 and Unicode version 2.0.

Except for comments ( 3.7), identifiers, and the contents of character and

string literals ( 3.10.4,  3.10.5), all input elements ( 3.5) in a Java program are

formed only from ASCII characters (or Unicode escapes ( 3.3) which result in

ASCII characters). ASCII (ANSI X3.4) is the American Standard Code for Infor

mation Interchange. The first 128 characters of the Unicode character encoding

are the ASCII characters.

footer