by Sthephen R. Donaldson,
Oxford University Computing Laboratory, UK;
Jonathan M.D. Hill, Sychron Ltd., Oxford, UK;
David B. Skillicorn,
CISC, Queen's University, Kingston, Canada.
Existing low-latency protocols make unrealistically strong assumptions
about reliability. This allows them to achieve impressive
performance, but also prevents this performance being exploited by
applications, which must then deal with reliability issues in the
application code. We present results from a new protocol that
provides error recovery, and whose performance is close to that of
existing low-latency protocols. We achieve a CPU overhead of 1.5us
for packet download and 3.6us for upload. Our results show that (a)
executing a protocol in the kernel is not incompatible with high
performance, and (b) complete control over the protocol stack enables
(1) simple forms of flow control to be adopted, (2) proper bracketing
of the unreliable portions of the interconnect thus minimising buffers
held up for possible recovery, and (3) the sharing of buffer pools.
The result is a protocol which performs well in the context of
parallel computation and the loose coupling of processes in the
workstations of a cluster.
by Holger Karl,
Institut fuer Informatik, Humboldt Universitaet zu Berlin, Germany.
Predictable network computing still involves a number of open
questions. One such question is providing a controlled amount of
CPU time to distributed processes. Mechanisms to control the CPU
share given to a single process have been proposed before.
Directly applying this work to distributed programs leads to
unacceptable performance, since the execution of processes on
distributed machines is not coordinated in time. This paper
discusses how coscheduling can be achieved with share-controlling
scheduling servers. The performance impact of scheduling control
is evaluated for BSP-style programs. These experiments show
that synchronization mechanisms are indispensable and that
coscheduling can be achieved for unmodified programs, but also
that a performance overhead has to be paid for the control over
by Cosimo Anglano, DSTA,
Università del Piemonte Orientale, Alessandria, Italy;
Attilio Giordana, DSTA,
Università del Piemonte Orientale, Alessandria, Italy;
Giuseppe Lo Bello, Data Mining Laboratory, CSELT, Torino, Italy.
The automatic construction of classifiers (programs able to correctly
classify data collected from the real world) is one of the major
problems in pattern recognition and in a wide area related to
Artificial Intelligence, including Data Mining. In this paper we
present G-Net, a distributed algorithm able to infer classifiers
from pre-collected data, and its implementation on PC-based networks
of workstations (PC-NOWs). In order to effectively exploit the
computing power provided by PC-NOWs, G-Net incorporates a set of
dynamic load distribution techniques that allow it to adapt its
behavior to variations in the computing power due to resource
contention. Moreover, it is provided with a fault tolerance scheme
that enables it to continue its computation even if the majority
of the machines become unnavailable during its execution.
by Umesh Kumar V. Rajasekaran, Malolan Chetlur, Girindra D. Sharma,
and Philip A. Wilsey
Computer Architecture Design Lab., Dept. of ECECS,
University of Cincinnati, Ohio.
With the advent of cheap and powerful hardware for workstations
and networks, a new cluster-based architecture for parallel
processing applications has been envisioned. However, fine-grained
asynchronous applications that communicate frequently are not the
ideal candidates for such architectures because of their high
latency communication costs. Hence, designers of fine-grained
parallel applications on clusters are faced with the problem of
reducing the high communication latency in such architectures.
Depending on what kind of resources are available, the
communication latency can be improved along the following
dimensions: (a) reducing network latency by employing a higher
performance network hardware (i.e., Fast Ethernet versus Myrinet);
(b) reducing communication software overhead by developing more
efficient communication libraries (MPICH versus TCPMPL (our TCP/IP
based message passing layer) versus MPI-BIP); (c)
rewriting/restructuring the application code for less frequent
communication; and (d) exploiting application characteristics by
deploying communication optimizations that exploit the application's
inherent communication characteristics. This paper discusses our
experiences with building a communication subsystem on a cluster
of workstations for a fine-grained asynchronous application (a
Time Warp synchronized discrete-event simulator). Specifically,
our efforts in reducing the communication latency along three of
the aforementioned dimensions are detailed and discussed. In
addition, performance results from an in-depth empirical evaluation
of the communication subsystem are reported in the paper.
by G. Conte +,
M. Mazzeo *,
A. Poggi +,
P. Rossi *,
and M. Vignali +.
+ Dipartimento di Ingegneria dell'Informazione
University of Parma, Italy.
* ENEA, HPCN Project, Bologna, Italy.
In this paper, we present a system called DONOW (Database on
Network of Workstation), that is an example of an
implementation and maintenance of a low cost distributed
database without performances penalties. The system has a
three tier architecture based on a client, a service and a
data layer. The client layer allows local and remote users
to interact with the system through a Java user friendly
interface. The service layer is implemented in C++; it allows
the service of user requests, in particular, the management of
queries involving more than one DONOW node. The data layer is
based on a well-known freeware relational database management
system, that is, MySQL for each DONOW node. DONOW is under
experimentation and will be used by Partena, an industry
producing very advanced machinery for pharmaceutical products
packaging, for the management of the information about their
machines, that consists of close to 100000 drawings and
by Toshiyuki Takahashi
Francis O'Carrol *,
Hiroshi Tezuka +,
Atsushi Hori +,
Shinji Sumimoto +,
Hiroshi Harada +,
Yutaka Ishikawa +,
and Peter H. Beckman ^
+ Real World Computing Partnership, Japan
* MRI Systems, Inc.
^ Los Alamos National Laboratory, USA.
An MPI library, called MPICH-PM/CLUMP, has been implemented on a
cluster of SMPs. MPICH-PM/CLUMP realizes zero copy message passing
between nodes while using one copy message passing within a node to
achieve high performance communication. To realize one copy message
passing on an SMP, a kernel primitive has been designed which enables
a process to read data of another process. The get protocol using
this primitive was added to MPICH. The MPICH-PM/CLUMP has been run
on an SMP cluster consisting of 64 Pentium II dual processors and
Myrinet. It achieves 98 MByte/sec between nodes and 100MBytes/sec
within a node.
firstname.lastname@example.org, Apr. 9, 1999