[gridrel-rg] Meeting of the Reliability and Robustness Research Group at OGF21
hiltunen at research.att.com
Thu Nov 1 14:52:16 CDT 2007
Here are some comments on the report.
1. Grid computing evolved from computing, distributed computing, cluster
computing, cycle scavenging (Condor), and meta/heterogeneous computing.
Each of these grid predecessors developed reliability methods and
techniques that are directly or indirectly still used today in grid
computing. Therefore, I believe the document should point out the
origins of each of the reliability techniques. I have attached a paper
from 1984 that surveys fault tolerance techniques starting from early
I believe document that points out the origins of the techniques would
not only be fair but also more useful for grid practitioners.
As a side note, we did start the discussion on if there is anything
specific to grid computing reliability (compared to what has been done
before). Some of the candidates involved scale, heterogeneity, multiple
administrative domains, etc. Such discussion would be useful addition to
the paper. Also, it would be interesting to investigate if the papers
with "grid" in their title actually address the new issues introduced by
2. I would like to have a clearer separation between the different
layers (hardware+OS, grid "middleware", and grid applications) carried
throughout the paper. The general concept of "grid resource" makes it
harder to talk about reliability techniques specifically.
3. There are different types of grid applications (relatively
independent jobs, parallel MPI-type applications, interactive/batch,
maybe "services"). Some of the specific reliability techniques only
apply to one/some of these application types. The presentation could be
clarified by structuring the techniques based on application types. Data
grids already have their own section and I think that is good.
4. I'm not convinced about sections 5 and 6. Especially
metrics/reliability analysis could be just one of the subsections (since
work has been done in the area).
Christopher Dabrowski wrote:
> Dear all,
> On October 17, there was a meeting of the Reliability and Robustness
> Group at OGF21. At this meeting we reviewed the draft OGF informational
> document titled *Reliability in Grid Computing Systems, *which is
> intended to be
> the primary output of the RG. The draft summarizes the state of
> current work on
> Grid system reliability and describes requirements for capabilities
> needed to ensure
> high levels of reliability in current and future large-scale grid
> systems. The draft is
> based on work presented at two earlier workshops (GGF16 in Athens and
> OGF19 in
> Chapel Hill) and includes a substantial amount of additional work that
> many of us have
> identified as being relevant.
> At this point, the informational document is scheduled for
> finalization by February of
> next year. A shorter version of the document is planned for submission
> to a special issue
> of a journal publication dedicated to OGF work--by the end of this
> year. Since the time frame
> is short, we request that you provide comments/review of the posted
> draft informational
> document by November 2. (To obtain a copy, please see below).
> At the meeting, different sections of the document were discussed.
> Matti Hiltunen
> volunteered to propose revisions to the deflations section and
> possibly portions of the
> introduction. Dominic Battre also agreed to provide a relevant
> reference for the document,
> which he has kindly provided. If anyone else has comments and/or
> contributions, they
> would be most welcome.
> The draft and copy of a powerpoint presentation given at the meeting
> are posted on the
> RG grid forge web site* or can be obtained upon request.
> Chris Dabrowski.
> *Please see https://forge.gridforum.org/sf/projects/gridrel-
> <https://forge.gridforum.org/sf/projects/gridrel-rg>. For the draft
> informational document,
> go to section labeled "documents". For the presentation go to "Meeting
> Christopher Dabrowski
> National Institute of Standards and Technology
> 100 Bureau Drive, Stop 8970
> Gaithersburg, MD 20899-8970
> Phone: +1 301 975-3249
> FAX: +1 301 948-6213
> cdabrowski at nist.gov
> gridrel-rg mailing list
> gridrel-rg at ogf.org
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 4719033 bytes
Desc: not available
Url : http://www.ogf.org/pipermail/gridrel-rg/attachments/20071101/8bb56b64/attachment-0001.pdf
More information about the gridrel-rg