| |
Databases and metadata on the Grid: tools and communities. (2/3)
(90 mins)
Roberto Barbera, Antonio Calanducci, Giacinto Donvito, Giuliano Taffoni
The use of databases (DB) in Grid infrastructures is increasing with time. In fact, new e-Science projects have a wide perception of the grid, and their applications require not only traditional computations, but also the use of complex data operations that require on-line and off-line access to pre-existing heterogeneous and independently operated DBs. DBMS are used both to handle data and metadata. Several scientific and industrial communities need e-infrastructures able to manage databases and metadata to develop and deploy their applications. Examples of those communities are: the Bioinformatics communities (see for example the BioinfoGRID project), the Astronomical communities (Virtual Observatory and Euro-VO projects), the Earth Science communities, the Climate Changes scientists (see for example the CMCC project whose Grid infrastructure is based on DBMS).
Some tools and services have been developed on this purpose: (i) the AMGA metadata catalogue, (ii) the Grid Data Source Engine (G-DSE), (iii) the Grid Relational Catalog (GRelC), (iv) the OGSA Data Access and Integration middleware (OGSA-DAI), (v) Spitfire, and (vi) Mobius project. Those tools have been addressing this important topic trying to provide a secure, transparent, robust, efficient and dynamic grid-enabled data access services for relational and non-relational data sources.
Due to the success of the Database session at OGF23 we believe that it is important to re-propose this session at OGF25. This for the following important reasons:
• We plan to involve other EU-funded projects that were not present at OGF23 and need database access through grid for their data;
• We had a lot of requests to re-propose this workshop in future OGF
• New advances on DB access and metadata management through Grid environment. In particular we believe that is it important making developers and users from the different fields meet and exchange experiences and solutions.
To achieve its goals, the workshop targets three main audiences:
• First the community of Grid DB access framework developers.
• Second the audience comprises the Grid users. This includes current and potential users in various scientific areas.
• People involved in the design and managing of scientific and industrial grid projects to help them to choose the right tool or infrastructure.
Keynote speakers from France, Italy, Spain, United Kingdom, USA etc.
Agenda: First Morning Session (9:00 - 10:30)
09:00 - 09:05 Barbera/Calanducci/Donvito: Welcome
09:05 - 09:30 Mike Jackson, EPCC UK - "The Evolution of OGSA-DAI"
Over six years ago, OGSA-DAI started as a project to provide data
access and integration capabilities to the e-Science community and, in
parallel, drive the development of data access specifications at
GGF/OGF. Over time OGSA-DAI the product deviated from the nascent
specifications but become a de-facto standard for distributed data
access and integration. However, in the last 6 months OGSA-DAI has come
full circle with implementations of the OGF DAIS-WG WS-DAI candidate
specifications.
In this talk we shall describe the evolution of OGSA-DAI from a piece
of research code through to a product used in many projects
world-wide. The challenges that have been encountered along the way
shall be described as well as future directions - moving OGSA-DAI from
an open source product to an open source project.
09:30-10:00 John White, CERN Switzerland -"Spitfire"
10:00-10:30 Sunil Ahn, KISTI Korea - "Integration of WS-DAIR Interface in AMGA"
AMGA is a gLite-metadata catalogue service designed to offer access to metadata for files stored on the Grid. WS-DAIR is the OGF standard for access to relational database on the Grid.
WS-DAIR allows AMGA a seamless integration into the OGF standardized Grid Data Access Services.
We present implementation details and performance study on the AMGA WS-DAIR interface.
We also address some interoperability issues against OGSA-DAI WS-DAIR implementation.
Second Morning Session (11:00 - 12:30)
11:00-11:30 Pasquale Pagano, ISTI-CNR Italy - "On D4Science Data Management Facilities"
The D4Science infrastructure is currently supporting multidisciplinary communities by providing them with customised Virtual Research Environments (VREs) for managing compound products. VREs are dynamically aggregated cooperation environments consisting of data,
services, applications, and computing and storage capabilities. D4Science builds upon a Grid based infrastructure and its enabling
technology to promote data generation, integration, and enrichment. D4Science plans also to include data curation facilities in its near future. The objective of this talk is threefold: (i) to present the role of the D4Science infrastructure that acts as mediator and facilitator for very large and multidisciplinary communities; (ii) to elaborate on the D4Science facilities for data management; (iii) to report on the OGF standards promoted by the D4Science infrastructure.
11:30-12:00 Roberto Cossu, ESA Italy, "Searching federated Earth Science Digital Repositories: The GENESI-DR approach”
12:00-12:30 Claudio Vuerli, INAF-OATS Italy,
"EuroVO and Grid Interoperation"
Afternoon Session (14:00-15:30)
14:00-14:25 Sebastien Denvil, IPSL France - "METAFOR: Standardising Metadata for Climate Modelling Digital Repositories"
The METAFOR project is developing a Common Information Model (CIM) to describe climate data and the models that produce it in a standard way. By addressing the fragmentation and gaps in existing metadata, the CIM will help identify, access and use climate model data, both in existing and future repositories. In particular, and in close collaboration with its international partners, METAFOR is developing controlled vocabulary and metadata collection procedures in support of the next climate model intercomparison project CMIP5. The development of such standards, which have shown their value in other communities, will over time optimize the way climate data infrastructures are used to store knowledge, thereby adding value to primary research data and information, and providing an essential asset for the numerous stakeholders actively engaged in climate change issues (policy, research, impacts, mitigation, private sector).
14:25-14:50 Alberto Redolfi, Fatebenefratelli Italy - "LORIS IN neuGRID: AN EFFICIENT DATABASE MANAGEMENT SYSTEM FOR HANDLING DATA IN THE NEUROIMAGING COMMUNITY"
Aim of neuGRID is to develop a new user-friendly Grid-based research e-Infrastructure enabling the neuroscience community to carry out research required for the pressing study of degenerative brain diseases. In neuGRID, the collection/archiving of large amounts of data is paired with computationally intensive data analyses. LORIS is a web-mediated database that allows for storage and management of clinical/imaging data. In this talk will be discussed the software from a final user perspective
14:50-15:15 Edwin A. Valentijn, Kapteyn Astronomical Institute Netherlands - "Target/AstroWise: from a compute centric to a datacentric approach"
Target/AstroWise develops and operates datacentric information systems for astronomical, cultural heritage en medical (genome) 100+Tbyte data sets. Extreme data lineage is implemented in order to facilitate users who request for dataproducts of any kind. Dataproducts (Targets) are produced on the fly on AstroWise nodes distributed over Europe or on EGEE- nodes.
15:15-15:30 Barbera/Calanducci/Donvito: Wrap up, Feedback and Conclusion
Location: Raffaello
|