[ogsa-bes-wg] Re: [jsdl-wg] Questions and potential changes to JSDL, as seen from HPC Profile point-of-view
andreas.savva at jp.fujitsu.com
Mon Jun 12 02:33:30 CDT 2006
Hi Marvin, Steve,
First off, Marvin, could I ask you to put unresolved issues into the
JSDL post-v1 tracker (as separate artifacts)?
(I'm sure there are many so I, and hopefully a few others from the
group, can help out with if you want. It's difficult to tell what's
agreed on and what's not after wading through long email threads.)
Some other comments in-line below. I've deleted some blocks of text for
clarity and I didn't try to reply to everything...
Marvin Theimer wrote:
> My responses are in-line below.
> *From:* A S McGough [mailto:asm at doc.ic.ac.uk]
> *Sent:* Friday, June 09, 2006 3:03 AM
> *To:* Marvin Theimer
> *Cc:* JSDL Working Group; ogsa-bes-wg at ggf.org; Ed Lassettre; Ming Xu
> *Subject:* Re: [jsdl-wg] Questions and potential changes to JSDL, as
> seen from HPC Profile point-of-view
> Hi Marvin,
> Thanks for the (fairly long) email, you've raised quite a few
> interesting points - which I'll address inline below. First off I'd just
> like to say that the JSDL document is meant to be a language
> specification document thus a large number of the issues about how JSDL
> should be used and what they have to support is not really in scope for
> that document. However, I do agree with you that such a document needs
> to exist - but for all uses of JSDL not just HPC. I would like to take
> your straw man and use it as the starting point for this document for
> the section on "using JSDL for HPC". Let me know what you think.
> [Marvin] As long as the HPC profile specification has /some/
> document/specification that it can employ to normatively define
> behaviors, I’m happy. Presumably “compliance” with JSDL will defined to
> mean compliance with this second document that you propose to create?
I agree with Steve's general statement that a number of these issues are
profiling issues. But I would have thought that the "JSDL for HPC" is
what the HPC profile would define through appropriate restrictions to
the JSDL spec.(?)
> 3. How will JSDL’s normative set of enumeration values
> for things like processor architecture and operating system be kept
> up-to-date and relevant? Also, how should things like operating system
> version get specified in a normative manner that will enable
> interoperability among multiple clients and job scheduling services?
> For example, things like Linux and Windows versions are constantly being
> introduced, each with potentially significant differences in
> capabilities that a job might depend on. Without a normative way of
> specifying these constantly evolving version sets it will be difficult,
> if not impossible, to create interoperable job submission clients and
> job scheduling services (including meta-scheduling services where
> multiple schedulers must interoperate with each other).
> Agreed. We don't yet have a way to add to the normative enumerations. I
> think you suggest below to move these into a separate document so that
> they can be updated more easily - this would seem a good idea. As for OS
> versioning I have my ideas though JSDL doesn't have a central plan yet.
> Again input here would be appreciated.
In general I agree with the idea of separate schemas for the values. And
if there is an appropriate schema already defined in CIM, or elsewhere,
we could just point to it. But I think the JSDL-WG shouldn't be in the
business of maintaining such lists. We should reuse what's available.
> 5. If one accepts the need for a variety of extension
> profiles then this raises the question of what should be in the base
> case. For example, it could be argued that data staging – with its
> attendant aspects such as mount points and mount sources – should be
> defined in an extension rather than in the core specification that will
> need to cover a variety of systems beyond just Linux/Unix/Posix.
> Similarly, one might argue that the base case should focus on what’s
> /functionally/ necessary to execute a job correctly and should leave
> things that are “optimization hints”, such as CPU speed and network
> bandwidth specifications, to extension profiles.
> Personally I'd agree with you that file staging should be in an
> extension. Though the view of the group was that most current DRM
> systems which would consume JSDL had file staging as a core element. I
> also agree on the idea of "optimization hints".
I don't see a problem if the HPC Profile decides to have data staging as
part of its extended features. Data staging may be in the main JSDL
namespace but that does not mean that everyone has to implement it. It's
there because we felt that it is common enough that everyone would want to.
> 6. How are concepts such as IndividualCPUSpeed and
> IndividualNetworkBandwidth intended to be defined and used in practice?
> I understand the concept of specifying things like the amount of
> physical memory or disk space that a job will require in order to be
> able to run. However, CPU speed and network bandwidth don’t represent
> functional requirements for a job – meaning that a job will correctly
> run and produce the same results irrespective of the CPU speed and
> network bandwidth available to it. Also, the current definitions seem
> fuzzy: the megahertz number for a CPU does not tell you how fast a given
> compute node will be able to execute various kinds of jobs, given all
> the various hardware factors that can affect the performance of a
> processor (consider the presence/absence of floating point support, the
> memory caching architecture, etc.). Similarly, is network bandwidth
> meant to represent the theoretical maximum of a compute node’s network
> interface card? Is it expected to take into account the performance of
> the switch that the compute node is attached to? Since switch
> performance is partially a function of the pattern of (aggregate)
> traffic going through it, the network bandwidth that a job such as an
> MPI application can expect to receive will depend on the /type/ of
> communications patterns employed by the application. How should this
> aspect of network bandwidth be reflected – if at all – in the network
> bandwidth values that a job requests and that compute nodes advertise?
> As said above we really need to define this in a separate "profile"
Network bandwidth is only meant to represent the theoretical maximum of
the node's NIC. The motivation was to allow at least some simple
statement about what networking capabilities you want the node to have.
Obviously not enough.
> 8. The current specification stipulates that conformant
> implementations must be able to parse all the elements and attributes
> defined in the spec, but doesn’t require that any of them be supplied.
> Thus, a scheduling service that does nothing could claim to be compliant
> as long as it can correctly parse JSDL documents. For interoperability
> purposes, I would argue that the spec should define a minimum set of
> elements that any compliant service must be able to supply. Otherwise
> clients will not be able to make any assumptions about what they can
> specify in a JSDL document and, in particular, client applications that
> programmatically submit job submission requests will not be possible
> since they can’t assume that any valid JSDL document will actually be
> acceptable by any given job submission service.
> Yes - this is true - though as the current document is a description of
> the JSDL "language" this is correct. These issues should all be
> clarified in the profile document.
Yes, I think this is a profiling decision.
> 9. I have a number of questions about data staging:
> 10. Although the notions of working directory and environment variables
> are defined in the posix extension, they are implicitly assuming in the
> data staging section of the core specification. This implies to me that
> either (a) data staging is made an extension or (b) these concepts are
> made a normative, required part of the core specification.
> Hmm - well spotted. Personally as I've said I'd like to see it made into
> an extension. This probably need s some discussion on the list.
I think environment variables do not refer to data staging so could you
give me a text reference? I could be forgetting something. (Both
Environment variables and DataStaging have a linkage to the Filesystem
The PosixApplication working directory has a relationship with
DataStaging in that it may specialize the DataStaging location. But I
don't see this as reason of itself to make DataStaging an extension.
Having said the above I do agree that, eventually, DataStaging should be
separated out. But I think a few other things have to be done before
that happens. For example we would probably have to have a way to
combine what are likely to be separate jsdl documents describing the
staging and execution stages and define what their dependencies are.
AFAI remember this was the main reason that data staging was not made a
JSDL extension in the first place.
To come to the HPC profile I see no reason (and no problem) why you
cannot simply restrict JSDL 1.0 usage in your base case to not include
staging and define it instead in a profile extension. Or am I missing
> 12. The current definitions of the well-known file systems seem
> imprecise to me. In particular:
> 13. What are the navigation rules associated with each? Can you cd out
> of the subtree that each represents? ROOT almost certainly does not
> allow that. Is there an assumption that one can cd out of HOME or TMP
> or SCRATCH? Hopefully not, since that would make these file systems
> even more Unix/Linux-centric, plus one would now need to specify what
> clients can expect to see when they do so.
> Again not defined here. Though I'd assume we can easily say in the
> profile that you can't cd out of it.
The intention of well-known names was to provide some minimal common
definitions of what could be expected. They are not normative, not
intended to be normative, and in retrospect they are more of a profiling
issue. Perhaps they shouldn't be in the JSDL spec at all.
> 20. Declare concepts such as executable path, command-line
> arguments, environment variables, and working directory to be generic
> and include them in the core JSDL specification rather than the posix
> extension. This may enable the core specification to support things
> like Windows-based jobs (TBD). The goal here is to define a core JSDL
> specification that in-and-of-itself could enable job submission to a
> fairly wide range of execution subsystems, including both the
> Unix/Linux/Posix world and the Windows world.
> Why do these need to be in the core? We had problems before in a
> pre-release version when they were in the core as people who wanted to
> do database submissions (and other things) were trying to map these into
> such elements.
> [*Marvin*] A Windows HPC job is not completely posix-compliant, yet has
> overlap on the above-listed set of concepts (and actually many more).
> So I would argue that we need /something/ that abstracts out the core
> concepts of a traditional HPC job. Given the presence of file data
> staging elements in the core specification – which I would argue are
> meaningless for database submissions – it seems like the above-listed
> elements are at least as generic as the data staging elements.
I think Donal's suggestion for a more generic ExecutableApp, together
with a WindowsApp are interesting enough to look into further.
Especially if the changes to the schema would be compatible with
existing JSDL 1.0 documents.
> Now, as presented above, my straw man proposal looks like suggestions
> for changes that might go into a JSDL-1.1 or JSDL-2.0 specification. In
> the near-term, the HPC profile working group will be exploring what can
> be done with just JSDL-1.0 and restrictions to that specification. The
> restrictions would correspond to disallowing those parts of the JSDL-1.0
> specification that the above proposal advocates moving to extension
> profiles. It will also explore whether a restricted version of the
> posix extension could be used to cover most common Windows cases.
> OK for those who have made it this far - possibly not many. I'm going to
> propose a JSDL call on this in a new email so all can see it.
I've already set up the call as Steve asked.
I think the 1.x branch should stay compatible to 1.0 and not introduce
any radical changes to the underlying schema. Longer term work might
result in a 2.0 with a possibly different structure. But I think we
should give priority to things that can be fixed within the 1.0
structure and can let the HPC profile work progress.
Fujitsu Laboratories Ltd
More information about the ogsa-bes-wg