The Standard ML Basis Library


The STRING signature

The STRING signature specifies the basic operations on a string type, which is a vector of the underlying character type char as defined in the substructure Char.

The STRING signature is matched by two structures, the required String and the optional WideString. The former implements strings based on 8-bit characters. The latter provides strings of characters of some size greater than or equal to 8 bits. In particular, structure String.Char is identical to the structure Char and, when defined, the structure WideString.Char is identical to WideChar. In addition, the type String.string is identical to CharVector.vector, and the type WideString.string is identical to WideCharVector.vector.


Synopsis

signature STRING
structure String : STRING
structure WideString : STRING

Interface

eqtype string
structure Char : CHAR
val maxSize : int
val size : string -> int
val sub : (string * int) -> Char.char
val extract : (string * int * int option) -> string
val substring : (string * int * int) -> string
val concat : string list -> string
val ^ : (string * string) -> string
val str : Char.char -> string
val implode : Char.char list -> string
val explode : string -> Char.char list
val map : (Char.char -> Char.char) -> string -> string
val translate : (Char.char -> string) -> string -> string
val tokens : (Char.char -> bool) -> string -> string list
val fields : (Char.char -> bool) -> string -> string list
val isPrefix : string -> string -> bool
val compare : (string * string) -> order
val collate : ((Char.char * Char.char) -> order) -> (string * string) -> order
val < : (string * string) -> bool
val <= : (string * string) -> bool
val > : (string * string) -> bool
val >= : (string * string) -> bool
val fromString : String.string -> string option
val toString : string -> String.string
val fromCString : String.string -> string option
val toCString : string -> String.string

Description

eqtype string

structure Char

maxSize
is the longest allowed length of a string.

size s
returns the number of characters in string s.

sub (s, i)
returns the ith character of s, counting from zero. This raises Subscript if i < 0 or size s <= i.

extract (s, i, NONE)
extract (s, i, SOME j)
substring (s, i, j)
return substrings of s. The first returns the substring of s from the ith character to the end of the string, i.e., the string s[i..size s-1]. This raises Subscript if i < 0 or size s < i. The second form returns the substring of length j starting at index i, i.e., the string s[i..i+j-1]. It raises Subscript if i < 0 or j < 0 or size s < i + j. Note that, if defined, extract returns the empty string when i = size s.

The third form returns the substring of length j starting at index i, i.e., the string s[i..i+j-1]. This is equivalent to extract(s, i, SOME j).

Implementation note:

Note that implementations of these functions must perform bounds checking in such a way that the Overflow exception is not raised.



concat l
is the concatenation of all the strings in l. This raises Size if the sum of all the sizes is greater than maxSize.

s ^ t
is the concatenation of the strings s and t. This raises Size if size s + size t > maxSize.

str c
is the string of size one containing the character c.

implode l
generates the string containing the characters in the list l. This is equivalent to concat (List.map str l).

explode s
is the list of characters in the string s.

map f s
applies f to each element of s from left to right, returning the resulting string. It is equivalent to CharVector.map and implode(List.map f (explode s)).

translate f s
returns the string generated from s by mapping each character in s by f. It is equivalent to concat(List.map f (explode s)).

tokens p s
fields p s
These functions return a list of tokens or fields, respectively, derived from s from left to right. A token is a non-empty maximal substring of s not containing any delimiter. A field is a (possibly empty) maximal substring of s not containing any delimiter. In both cases, a delimiter is a character satisfying the predicate p.

Two tokens may be separated by more than one delimiter, whereas two fields are separated by exactly one delimiter. For example, if the only delimiter is the character #"|", then the string "|abc||def" contains two tokens "abc" and "def", whereas it contains the four fields "", "abc", "" and "def".

isPrefix s1 s2
returns true if the string s1 is a prefix of the string s2.

compare (s, t)
does a lexicographic comparison of the two strings using the ordering Char.compare on the characters. It returns LESS, EQUAL, or GREATER, if s is less than, equal to, or greater than t, respectively.

collate f (s, t)
performs lexicographic comparison of the two strings using the given ordering f on characters.

s < t
s <= t
s > t
s >= t
compare two strings lexicographically.

fromString s
scans a printable string s as an SML source program string, converting escape sequences into the appropriate characters. It does not skip leading whitespace. It returns as many characters as it can successfully scan, stopping when it reaches the end of s or a non-printing character, or if it encounters an improper escape sequence. These additional characters are ignored. If no conversion is possible, e.g., if the first character is non-printable or begins an illegal escape sequence, NONE is returned. Note, however, that fromString "" returns SOME "".

For more information on the allowed escape sequences, see the entry for CHAR.fromString.

toString s
returns a string corresponding to s, with non-printable characters replaced by SML escape sequences. This is equivalent to

translate Char.toString s

fromCString s
scans the string s as a C source program string, converting C escape sequences into the appropriate characters. The semantics are identical to fromString above, except that C escape sequences are used (cf. ISO C standard ISO/IEC [CITE]9899:1990/).

For more information on the allowed escape sequences, see the entry for CHAR.fromCString.

toCString s
returns a string corresponding to s, with non-printable characters replaced by C escape sequences. This is equivalent to

translate Char.toCString s


See Also

CHAR, SUBSTRING, StringCvt, MultiByte, CharVector, CharArray, WideCharVector

[ INDEX | TOP | Parent | Root ]

Last Modified October 6, 1997
Comments to John Reppy.
Copyright © 1997 Bell Labs, Lucent Technologies