StreamIO
functor
The optional StreamIO
functor provides a way to build a stream IO stack on top of an arbitrary primitive I/O implementation. For example, given an implementation of readers and writers for pairs of integers, one can define streams of pairs of integers.
functor StreamIO
( ... ) : STREAM_IO
structure PrimIO : PRIM_IO
structure Vector : MONO_VECTOR
structure Array : MONO_ARRAY
sharing type PrimIO.elem = Vector.elem = Array.elem
sharing type PrimIO.vector = Vector.vector = Array.vector
sharing type PrimIO.array = Array.array
val someElem : PrimIO.elem
structure PrimIO
structure Vector
structure Array
sharing type PrimIO.elem
sharing type PrimIO.vector
sharing type PrimIO.array
someElem
The Vector
and Array
structures provide vector and array operations for manipulating the vectors and arrays used in PrimIO
and StreamIO
. The element someElem is used to initialize buffer arrays; any element will do.
The types instream
and outstream
in the result of the StreamIO functor must be abstract.
If flushOut finds that it can do only a partial write (i.e., writeVec
or a similar function returns a ``number of elements written'' less than its sz argument), then flushOut must adjust its buffer for the items written and then try again. If the first or any successive write attempt returns zero elements written (or raises an exception) then flushOut raises the IO.Io exception.
If an exception occurs during any stream I/O operation, then the module must, of course, leave itself in a consistent state, without losing or duplicating data.
In some ML systems, a user interrupt aborts execution and returns control to a top-level prompt, without raising any exception that the current execution can handle. It may be the case that some information must be lost or duplicated. Data (input or output) must never be duplicated, but may be lost. This can be accomplished without stream I/O doing any explicit masking of interrupts or locking. On output, the internal state (saying how much has been written should be updated before doing the write operation; on input, the read should be done before updating the count of valid characters in the buffer.
Implementation note:
Here are some suggestions for efficient performance:
- Operations on the underlying readers and writers (
readVec
, etc.) are expected to be expensive (involving a system call, with context switch).- Small input operations can be done from a buffer; the
readVec
orreadVecNB
operation of the underlying reader can replenish the buffer when necessary.- Each reader may provide only a subset of
readVec
,readVecNB
,block
,canInput
, etc. An augmented reader that provides more operations can be constructed usingPrimIO.augmentIn
, but it may be more efficient to use the functions directly provided by the reader, instead of relying on the constructed ones. The same applies to augmented writers.- Keep the position of the beginning of the buffer on a multiple-of-
chunkSize
boundary, and do read or write operations with a multiple-of-chunkSize
number of elements.- For very large
inputAll
orinputN
operations, it is (somewhat) inefficient to read onechunkSize
at a time and then concatenate all the results together. Instead, it is good to try to do the read all in one large system call; that is,readBlock(n)
. However, in a typical implementation ofreadVec
, this requires pre-allocating a vector of size n. However, ininputAll()
, the size of the vector is not known a priori and if the argument toinputN
is large, the allocation of a much-too-large buffer is wasteful. Therefore, for large input operations, query the size of the reader usingendPos
, subtract the current position, and try to read that much. But one should also keep things rounded to the nearestchunkSize
.- The use of
endPos
to try to do (large) read operations of just the right size will be inaccurate on translated readers. But this inaccuracy can be tolerated: if the translation is anything close to 1-1,endPos
will still provide a very good hint about the order-of-magnitude size of the file.- Similar suggestions apply to very large output operations. Small outputs go through a buffer; the buffer is written with
writeArr
. Very large outputs can be written directly from the argument string usingwriteVec
.- A lazy functional instream can (should) be implemented as a sequence of immutable (vector) buffers, each with a mutable ref to the next ``thing,'' which is either another buffer, the underlying reader, or an indication that the stream has been truncated.
- The
input
function should return the largest sequence that is most convenient. Usually this means ``the remaining contents of the current buffer.''- To support non-blocking input, use
readVecNB
if it exists, otherwise docanInput
followed (if appropriate) byreadVec
.- To support blocking input, use
readVec
if it exists, otherwise doreadVecNB
followed (if it would block) byblock
. and then anotherreadVecNB
.- To support lazy functional streams,
readArr
andreadArrNB
are not useful. If necessary,readVec
should be synthesized fromreadArr
andreadVecNB
fromreadArrNB
.writeArr
should, if necessary, be synthesized fromwriteVec
and vice versa. Similarly forwriteArrNB
andwriteVecNB
.
STREAM_IO
Last Modified May 10, 1996
Comments to John Reppy.
Copyright © 1997 Bell Labs, Lucent Technologies