[drmaa-wg] drmaa_job_ps on finished jobs
peter.troeger at hpi.uni-potsdam.de
Fri May 19 01:55:06 CDT 2006
> I have a question concerning drmaa_job_ps behaviour for jobs that have
> finished (normally or failed). DRMAA 1.0 spec says:
> "drmaa_job_ps DRMAA SHOULD always get the status of job_id from DRM
> system, unless the previous status has been DRMAA_PS_FAILED or
> DRMAA_PS_DONE and the status has been successfully cached. Terminated
> jobs get DRMAA_PS_FAILED status."
> Does that mean that DRMAA library should cache job status (_PS_FAILED or
> _PS_DONE) and return it even when the job data was reaped by
> drmaa_wait/drmaa_synchronize? I'm a bit confused because SGE's
> implementation returns DRMAA_ERRNO_INVALID_JOB after drmaa_wait() but
> Condor's library caches that information forever -- unless lib's log
> files are deleted...
SGE is doing it in the right way. The point is that there is no sentence
in the spec which prohibits job status availability even after the wait
operation. We only have this sentence for drmaa_wait() operation itself,
but nothing like this for drmaa_job_ps():
"The routine reaps jobs on a successful call, so any subsequent calls
to drmaa_wait SHOULD fail returning an error DRMAA_ERRNO_INVALID_JOB
meaning that the job has been already reaped."
I would therefore say that Condor currently does not violate the spec,
but interprets it in a very unusual way. There is already some according
note in the TODO section of the Condor DRMAA documentation. I will fix
this for the next version, which will hopefully make it to Condor
6.7.20. Thank you !
More information about the drmaa-wg