CHAR
signatureThe CHAR signature defines a type char of characters and provides basic operations and predicates on values of that type. There is a linear ordering supported on characters. In addition, there is an encoding of characters into a contiguous range of non-negative integers that preserves the linear ordering.
There are two structures matching the CHAR signature. The Char
structure defines a superset of the usual ASCII characters and locale-independent operations on them. For this structure, Char.maxOrd
= 255.
The optional WideChar
structure defines wide characters, which are represented by a fixed number of 8-bit words (bytes). If the WideChar is provided, it is distinct from the Char structure.
signature CHAR
structure Char
: CHAR
structure WideChar
: CHAR
eqtype char
eqtype string
val minChar : char
val maxChar : char
val maxOrd : int
val ord : char -> int
val chr : int -> char
val succ : char -> char
val pred : char -> char
val < : (char * char) -> bool
val <= : (char * char) -> bool
val > : (char * char) -> bool
val >= : (char * char) -> bool
val compare : (char * char) -> order
val contains : string -> char -> bool
val notContains : string -> char -> bool
val toLower : char -> char
val toUpper : char -> char
val isAlpha : char -> bool
val isAlphaNum : char -> bool
val isAscii : char -> bool
val isCntrl : char -> bool
val isDigit : char -> bool
val isGraph : char -> bool
val isHexDigit : char -> bool
val isLower : char -> bool
val isPrint : char -> bool
val isSpace : char -> bool
val isPunct : char -> bool
val isUpper : char -> bool
val fromString : String.string -> char option
val scan : (Char.char, 'a) StringCvt.reader -> 'a -> (char * 'a) option
val toString : char -> String.string
val fromCString : String.string -> char option
val toCString : char -> String.string
eqtype char
eqtype string
minChar
chr 0
.
maxChar
maxOrd
ord maxChar
.
ord c
chr i
maxOrd
. When chr is restricted to the interval [0,maxOrd
], these two functions denote the character encoding function and its inverse.
succ c
maxChar
. When defined, succ c
is equivalent to chr(ord c + 1)
.
pred c
minChar
. When defined, pred c
is equivalent to chr(ord c - 1)
.
c < d
c <= d
c > d
c >= d
compare (c, d)
contains s c
true
if character c occurs in the string s; otherwise false
.
Implementation note:
In some implementations, the partial application of contains to s may build a table, which is used by the resulting function to decide whether a given character is in the string or not. Hence it may be expensive to compute
val p = contains s
, but fast to computep c
for any given character c.
notContains s c
true
if character c does not occur in the string s; false
otherwise. Equivalent to not(contains s c
).
Implementation note:
As with contains, notContains may be implemented via table lookup.
toLower c
toUpper c
isAlpha c
true
if c is a letter (lowercase or uppercase).
isAlphaNum c
true
if c is alphanumeric (a letter or a decimal digit).
isAscii c
true
if c is a (seven-bit) ASCII character, i.e., 0 <= ord
c <= 127. Note that this function is independent of locale.
isCntrl c
true
if c is a control character. Equivalent to not o isPrint
.
isDigit c
true
if c is a decimal digit (0-9).
isGraph c
true
if c is a graphical character, that is, it is printable and not a whitespace character.
isHexDigit c
true
if c is a hexadecimal digit (0-9, a-f, A-F).
isLower c
true
if c is a lowercase letter.
isPrint c
true
if c is a printable character (space or visible), i.e., not a control character.
isSpace c
true
if c is a whitespace character (space, newline, tab, carriage return, vertical tab, formfeed).
isPunct c
true
if c is a punctuation character: graphical but not alphanumeric.
isUpper c
true
if c is an uppercase letter.
fromString s
scan getc strm
The allowable escape sequences are:
\a Alert (ASCII 0x07) \b Backspace (ASCII 0x08) \t Horizontal tab (ASCII 0x09) \n Linefeed or newline (ASCII 0x0A) \v Vertical tab (ASCII 0x0B) \f Form feed (ASCII 0x0C) \r Carriage return (ASCII 0x0D) \\ Backslash \" Double quote \^c A control character whose encoding is C - 64, where C is the encoding of the character c, with C in the range [64,95]. \ddd The character whose encoding is the number ddd, three decimal digits denoting an integer in the range [0,255]. \uxxxx The character whose encoding is the number xxxx, four hexadecimal digits denoting an integer in the ordinal range of the alphabet. \f...f\ This sequence is ignored, where f...f stands for a sequence of one or more formatting characters.
In the escape sequences involving decimal or hexadecimal digits, the sequence of digits is taken to be the longest sequence of such characters. If the resulting value cannot be represented in the character set, NONE is returned.
toString c
#"\\"
and #"\""
, are left unchanged. Backslash #"\\"
becomes "\\\\"
; double quote #"\""
becomes "\\\""
. The common control characters are converted to two-character escape sequences:
Alert (ASCII 0x07) "\\a" Backspace (ASCII 0x08) "\\b" Horizontal tab (ASCII 0x09) "\\t" Linefeed or newline (ASCII 0x0A) "\\n" Vertical tab (ASCII 0x0B) "\\v" Form feed (ASCII 0x0C) "\\f" Carriage return (ASCII 0x0D) "\\r"The remaining characters whose codes are less than 32 are represented by three-character strings in ``control character'' notation, e.g.,
#"\000"
maps to "\\^@"
, #"\001"
maps to "\\^A"
, etc. All other characters (i.e., those whose codes are 127 or greater) are mapped to four-character strings of the form "\\ddd"
, where ddd
are the three decimal digits corresponding to a character's code.
fromCString s
The allowable escape sequences are given below (cf. Section 6.1.3.4 of the ISO C standard ISO/IEC [CITE]9899:1990/).
\a Alert (ASCII 0x07) \b Backspace (ASCII 0x08) \t Horizontal tab (ASCII 0x09) \n Linefeed or newline (ASCII 0x0A) \v Vertical tab (ASCII 0x0B) \f Form feed (ASCII 0x0C) \r Carriage return (ASCII 0x0D) \? Question mark \\ Backslash \" Double quote \' Single quote \^c A control character whose encoding is C - 64, where C is the encoding of the character c, with C in the range [64,95]. \ddd The character whose encoding is the number ddd, where ddd consists of one to three octal. \uxxxx The character whose encoding is the number xxxx, where xxxx is a sequence of hexadecimal digits.
In the escape sequences involving octal or hexadecimal digits, the sequence of digits is taken to be the longest sequence of such characters. If the resulting value cannot be represented in the character set, NONE is returned.
toCString c
#"\\"
, #"\""
, #"?"
and #"'"
are left unchanged. Backslash #"\\"
becomes "\\\\"
; double quote #"\""
becomes "\\\""
, question mark #"?"
becomes "\\?"
, single quote #"'"
becomes "\\'"
. The common control characters are converted to two-character escape sequences:
Alert (ASCII 0x07) "\\a" Backspace (ASCII 0x08) "\\b" Horizontal tab (ASCII 0x09) "\\t" Linefeed or newline (ASCII 0x0A) "\\n" Vertical tab (ASCII 0x0B) "\\v" Form feed (ASCII 0x0C) "\\f" Carriage return (ASCII 0x0D) "\\r"All other characters are represented by one to three octal digits, corresponding to a character's code, preceded by a backslash.
In WideChar, the functions toLower, toLower, isAlpha,..., isUpper are locale-dependent. In Char, these functions are locale-independent, with the following semantics:
isUpper c |
true if #"A" <= c andalso c <= #"Z"
|
isLower c |
true if #"a" <= c andalso c <= #"z"
|
isDigit c |
true if #"0" <= c andalso c <= #"9"
|
isAlpha c |
true if isUpper c orelse isLower c
|
isAlphaNum c |
true if isAlpha c orelse isDigit c
|
isHexDigit c |
true if isDigit c orelse (#"a" <= c andalso c <= #"f") orelse (#"A" <= c andalso c <= #"F")
|
isGraph c |
true if #"!" <= c andalso c <= #"~"
|
isPrint c |
true if isGraph c orelse c = #" "
|
isPunct c |
true if isGraph c andalso not (isAlphaNum c)
|
isCtrl c |
true if not (isPrint c)
|
isSpace c |
true if (#"\t" <= c andalso c <= #"\r") orelse c <= #"\ "
|
isAscii c |
true if 0 <= ord c andalso ord c <= 127
|
toLower c |
chr (ord c + 32) if isUpper c ; otherwise, c
|
toUpper c |
chr (ord c - 32) if isLower c ; otherwise, c
|
Locale, MultiByte, STRING
Last Modified October 6, 1997
Comments to John Reppy.
Copyright © 1997 Bell Labs, Lucent Technologies