Parallel and Distributed Systems Lab (PDSL)
Effective and Efficient Communication in Compute Clusters
Over the last ten years, the use of compute clusters has grown from a few specially built Beofulf-style systems (originally using channel bonding of conventional 10mbps Ethernet NICs) to a multitude of styles and sizes of clusters deployed in virtually all significant scientific R&D organizations. The widespread use of clusters in ever larger configurations and their application to solving ever more aggressive computational problems (especially those requiring frequent, small volume inter-processor communications) makes for challenging implementation issues. Foremost among these are supporting efficient communications between processes running on different processors in the absence of high-speed, memory-based IPC and simplifying the programming of parallel applications in such an environment.
Specific sub-areas of interest
Within the PDSL our interests in this area are focused on the design and performance evaluation of cluster communications hardware and software and specific algorithms that rely on them as well as on the use of Distributed Shared Memory (DSM) and other, related, software techniques to ease the complexity of programming in such environments. The first work on performance assessment was done by Mr. Anindya Maiti, an interdisciplinary MSc student co-supervised by Dr. Graham and Dr. Robert Derksen (Dept. of Electrical and Computer Engineering). Mr. Maiti undertook the design and evaluation of an airfoil modeling program using PVM and running on the original version of the PDSL CAB cluster. This work has continued with the most recent results being completed by Mr. Hossein Pourreza as part of an independent research project separate from his PhD research in pervasive computing. Mr. Pourreza did a detailed performance assessment of a variety of MPI applications running over 4 high-speed interconnects on the previous generation CAB cluster. (One of the first things that will be done following the installation of the new cluster will be a re-assessment of his results.) In the area of simplifying the use of distributed memory machines, done by abstracting the inter-processor communication, a range of work has been done related to DSM and Object Oriented (OO) parallel and distributed programming in cluster environments. This work includes Dr. Eskicioglu's thesis research on the JiaJia DSM system and work by Dr. Graham and his students, most notably Yahong Sui and Arne Grimstrup, exploring an object based DSM protocol based on Bershad's EC protocol and multi-versioned objects in DSM, respectively.
Our Publications on Cluster Communications
Related Work on Cluster Communications
For More Information Contact:
Send mail to
questions or comments about this web site.