Standard character sets contain 128 standard characters, numbered 0 to 127, encoded in one 8-bit byte. Extended character sets contain an additional 128 extended characters, numbered 128 to 255, encoded in the same 8-bit byte.
Source programs contain only the standard character set characters. Extended characters are discarded. All language elements are composed of printable standard characters, plus three whitespace characters, space, tab, newline. - which have values 0x20, 0x09, 0x0A.
To programs, however, characters are unsigned bytes. How these bytes are interpreted depends on the programs, though many built-in intrinsic functions assume the standard character set.
Certain groups of characters are referred to by the following names:
Alphabetic "A - Z", "a - z"
Alphanumeric "A - Z", "a - z", "0 - 9"
Numeric "0 - 9"
Binary "0 - 1"
Octal "0 - 7"
Hexadecimal "0 - 9", "A - F", "a - f"
Symbol Characters "A - Z", "a - z", "0 - 9"
Type Suffixes @ @@ % %% & && ~ ! # $$ $
Scope Prefixes # ##
To parse means to break program text into language elements. For example, thisVar=thatVar parses into three language elements:
thisVar - variable
= - assignment operator
thatVar - variable
The following process is performed to find each language element. First, leading whitespace is ignored. Then, successive characters are collected until adding the next character would produce an invalid language element.
Whitespace separates language elements that would otherwise be inappropriately combined into one. For example, FORK=ATOM means "assign variable ATOM to variable FORK". In contrast, most conventional BASIC languages interpret FORK=ATOM to mean FOR K = A TO M. To write the FOR statement requires the FOR and TO keywords be separated from the adjacent variables.