The JAXB Resource Manager plug-ins, part of the
Parallel Tools Platform (PTP), allow you to launch and monitor
applications on local or remote resources using resource managers
which are configured from an XML file via JAXB (
javax.xml.bind
) technology.
There are two main motivations for providing this class of resource managers:
An additional consideration in designing a generically configurable resource manager was to partition the client functionality so as to eliminate the need for special server-side proxies and to scale more successfully in the updating of job and resource information.
To this end, JAXB resource managers now consist of two components, a "control", which governs the configuration, launch and cancellation of individual jobs entirely from the client end, and a "monitor", which displays job status as well as global information about the HPC resource. In most cases, the monitor will be a pre-built type provided by the PTP distribution, implemented using LLview. Since LLview already supports a good number of the standard scheduler types, adding a new resource manager type will normally entail only the specific configuration of its control part. We plan to make available both user-initiated as well as system-wide deployment of the necessary LLview parts (mostly Perl scripts) for monitoring the target resource. See further under the User pages for more information
The following is a guide to creating or modifying a resource manager configuration. Those interested only in using the JAXB resource managers already provided with the PTP distribution should consult the User pages under the relevant scheduler (currently only the PBS resource managers are JAXB-configurable).
The JAXB Resource Manager is model-driven ; this means that its functioning and appearance are determined by a set of definitions provided usually via an XML file. What follows is a detailed explanation of the schema governing the resource manager XML configuration.
The top-level of the definition tree consists of three elements: site-, control- and monitor-data. In addition, a resource manager should be given a name which sufficiently distinguishes it from others of a similar type; e.g., pbs-torque-v_2.3.7_abe is specific to an installation on the host abe, ll-v_4.0 suits all installations of LoadLeveler version 4, etc.
The site-data element provides an optional place to set default
remote site information. The connection strings are URIs which are
specific to the PTP RemoteServices definitions. The scheme for these
URIs will usually name the specific remote service (e.g,
rse:
or
remotetools:
; local is simply
file:
). The host name and port given here will appear as defaults in the
resource manager selection wizard when you create a new connection.
The principal section of the schema is devoted to defining the resource manager's control part. The top-level control elements include properties and attributes, files to be staged, job script to be generated (if any), commands specific to the resource manager, and the layout of the Launch Tab.
The resource manager implementation constructs a variable map from the
defined properties and attributes which serves as the resource manager
"environment"; these variables are dereferenced in the configuration
file via
${ptp_rm:name#fieldName}
, e.g.,
${ptp_rm:queues#value}
(see further below on the fields for properties and attributes); all
properties and attributes defined in the configuration are mapped. The
following hard-coded properties are also added at runtime:
control.user.name
control.address
control.workding.dir
executablePath
progArgs
directory
Commands are externally executed calls, either to a local or remote OS, depending on the connection defined for the resource manager. The start-up- and shut-down-commands are arbitrary commands to be run (in order) at startup or exit. The submit commands are those used to launch jobs. Currently a configuration may have only a batch or an interactive mode. Thus it may have only two submission modes, a run and a debug, for the given type. In the future we may allow all four to coexist in a single configuration. get-job-status is a user-initiated (on-demand) request to refresh the status information for a submission. Normal (polled) updates are the responsibility of the monitor. The command nevertheless needs to be implemented in most cases, as it will be called internally just after submission. The remaining commands are operations which can be applied to running jobs; with the exception of terminate, the rest of these have to do with schedulers (batch-systems) and do not apply to resource managers which connect to interactive runtime-systems such as OpenMPI or PE. Note: if the submission type is interactive, the terminate-job command usually does not need to be implemented, as the process termination will be handled internally. However, in some cases (such as PBS -I) which require the interactive job to run as a pseudo-terminal, one may need this command in order to force its termination externally.
The majority of the XML definition is given over to the set-up of the Resource Manager control. One can think of this section as having four subdivisions:
We will look at these each in turn.
A property is any variable necessary for the functioning of the
resource manager. Properties often (but not necessarily) are not
visible. The value for properties can be any primitive type, or lists
or maps of strings. If
stdout
and
stderr
from a scheduled job is to be delivered to the client, the properties
stdout_remote_path and stderr_remote_path should be
included in the resource manager property set.
NOTE: the untyped "value" element on properties is for internal use only; to give a predefined (primitive) value, use the "default" element along with the "type" attribute.