Strong commercial pressures and rapid progress in distributed computing means that developers of advanced collaboratory technologies find themselves in a difficult situation. Commercial interest in distributed computing has led to significant standardization efforts (e.g., the Common Object Request Broker Architecture: CORBA) that cannot be ignored. Yet quality implementations of these standards are still scarce, and those implementations that exist lack features judged essential for DOE applications, such as multicast, high-performance transport, and implementations on parallel computers. Individual projects do not have the resources to produce high-performance implementations of standards themselves. The result is the adoption of mutually incompatible software solutions across different DOE collaboratory projects, and considerable duplication of effort.
We propose to overcome this problem by a "standards+" approach to the development of a DOE-wide software infrastructure, in which existing standards and commercial or public domain software are evaluated and extended where necessary to meet DOE requirements. We propose that this approach be pursued in a three-year, multi-lab project, with three distinct phases:
We expect that this work will adopt CORBA interfaces and services as a framework within which DOE-specific requirements can be addressed. The use of CORBA has significant advantages, including the existence of well-documented standards, availability of software for immediate development, and interoperability with commercial systems.
A significant outcome of this work is likely to be a high-performance implementation of key CORBA facilities, portable across high-performance platforms of interest to DOE and incorporating specialized features (e.g., multicast and security) required for DOE applications. The development of such a system will represent a major DOE contribution to the distributed computing community and will inevitably also lead to significant research advances.
In developing this "standards+" software infrastructure, we will work closely with the DOE user community on the one hand, and with CORBA and distributed systems developers on the other. In fact, the first activities that we propose for this new initiative are two workshops:
Monitoring tools must be available in the infrastructure to allow applications to monitor the performance received and to determine the best configuration to achieved desired performance.
Modular design and code reuse requires that services and applications be represented as objects with interfaces defined by methods.
The software infrastructure must interoperate with the existing CORBA ORB implementations. This is because development based on existing CORBA implementations is already underway, and because it enables reuse of commercial CORBA-based software.
The software infrastructure will take time to develop and deploy to the application development community. It would be a mistake to require that the developers wait until the infrastructure is in place.
It is impossible to predict in advance the maximum number of sites or users that might participate in a collaboratory so the infrastructure needs to be scalable.
Although we are able to identify the immediate needs of applications with regard to the software infrastructure, there will be needs that arise in the future that cannot be predicted today. The software infrastructure must allow easy incorporation of new services, sites, and equipment.
The collaboratory environment will be a dynamically built collection of resources where the particular resource names and locations were not decided in advance. The software infrastructure must provide mechanisms for resource discovery and location since the collection of resources available will be large, scattered, and dynamically changing. Once a resource is located, it will have certain access restrictions that will need to be enforced as well.
The infrastructure needs to provide applications with an event service so that each application is not required to build its own.
Component/object names should be hierarchical so that searches in the name space for a particular component/object can be accomplished without resorting to an exhaustive search.
A single security architecture allowing interoperability of individual site security mechanisms. Security integrated into the infrastructure allows sign-on/authentication to be carried out once per session rather than once per application.
The collaboratory environment by its very nature will involve large groups of scientists working together. The applications which will be built to support such collaborations will need to disseminate information to a group. Use of multicast mechanisms allows efficient dissemination of information to that group. The multicast messaging services required include unreliable unordered multicast through to reliable ordered multicast.
Synchronous RPC-like mechanisms are useful in many settings but do not provide a rich enough vocabulary to satisfy the needs of all applications. RPC mechanisms are particularly ill-suited to multicast messaging. The software infrastructure needs to provide both synchronous and asynchronous messaging mechanisms.
Collaboratory components will run on a wide variety of hardware, network and operating system platforms, the infrastructure must operate and interoperate on these platforms.
Scientific data can be exceedingly large and the infrastructure will need to provide support for the transport of large messages. One issue here will be the data translation routines and how they are implemented. If each message must be translated into a generic format before being sent regardless of the receiver's data format needs then it will be difficult to maintain reasonable performance. Data translation should be performed on an as-needed basis and applications should be allowed the means to control the translation if behavior other than the default is desired.
Legacy software needs to be integrated with and accesible through any new infrastructure developed.
Application designers will be developing in a variety of languages and on various architectures. They will need support for compilation of their programs to allow easy use of the software infrastructure mechanisms.
Many applications and interfaces are being developed using Java and WWW pages, access to the software infrastructure from these applications must be provided.
Applications designed to run multithreaded need to be able to use the software infrastructure.
Each site (laboratory) can individually maintain and control their piece of the infrastructure.
No existing software system meets the requirements listed above. Systems such as the Message Passing Interface address the need for high-performance transport. The Nexus and Globus systems address in addition issues of naming, resource location, etc. Parallel C++ systems such as CC++ provide object-oriented design. Legion provides a distributed, object oriented parallel processing environment. Various CORBA products meet a cross section of the above requirements, but tend to have poor performance and to lack key elements such as multicast.
These considerations motivate the following approach to the definition of software infrastructure.