Susana Snchez Expsito Instituto de Astrofsica de Andaluca -
CSIC Pablo Martin, Jose Enrique Ruiz, Lourdes Verdes-Montenegro,
Julian Garrido, Ral Sirvent, Antonio Ruz Falc and Rosa Badia Web
services as building Blocks for Science Gateways in Astronomy IWSG
2015, Budapest
Slide 2
Context: Big Data Era The scientific community is facing a data
deluge. In particular in Astronomy ! SKA1 is currently in the
detailed design phase. Construction phase: 2018-2023 SKA2 design
and construction phases: 2018 2030 Paradigmatic Big Data
Instrument: Square Kilometre Array (SKA) Credit: skatelescope.org
(SKA organisation) The aperture arrays in the SKA could produce
more than 100 times the global internet traffic.
Slide 3
Context: Facing the data deluge GRID Super Computing clusters
Clouds grid engine Taverna Visualization tools They are built upon
the scientific applications. They adapt the scientific applications
to the DCIs (using the DCI API and/or tools for making a good use
of them) They are designed aiming to be re- used (e.g. they can be
customized with a specific parameters/inputs) They usually are
workflows, visualization or management data tools WFs expose the
scientific method, facilitating they can be reused Custom-made
scripts Custom-made portals Data deluge, a twofold challenge:
Technological and Scientific Populated with Advanced Tools: This
work: advanced tools (workflows) for analysis tasks of interest for
SKA use cases.
Slide 4
Context: Facing the data deluge in Astronomy Virtual
Observatory Standards Data interoperability Uniform access to
multiple archives The Virtual Observatory (VO) is a network of Web
data services which provides access to astronomical archives Tools
for local visualization and analysis Standards for analysis
services: Avoid moving the data Work in progress www.ivoa.net Big
Data technics for data reduction pipelines Transforming raw data
into science-ready data E.g.: Pipeline to detect transient sources
LOFAR (a SKA pathfinder) Data stored in a MonetDB to accelerate the
data access. A transient source: the Crab pulsar An effort is still
needed To improve the tools for analysing large volumes of
science-ready data To publish these tools in collaborative
environments
Slide 5
Science use case. Analysis of HI datacubes for kinematical
modelling of galaxies RA DEC Frequency Radio interferometers as the
SKA generate data with two spatial coordinate axes and a spectral
axis The astronomers analyse these HI datacubes to produce
kinematical models of galaxies This kind of studies are included in
the SKA use cases Based on GIPSY: Powerful software for analysing
radio interferometric data.
Slide 6
Science use case. Analysis of HI datacubes for kinematical
modelling of galaxies Parameter space exploration (parameter sweep
workflow): The astronomer usually sweeps a parameter range in order
to find the best model ROTCUR derives the kinematical parameters
from the datacube including the rotation curve ELLINT calculate the
radial profile of the galaxy emission GALMOD builds a model from
the kinematical parameters
Slide 7
Framework: AMIGA for GTC, ALMA and SKA pathfinders Amiga4GAS
presented at IWSG 2013, Zurich The here presented work
Collaborators of the SKA SDP consortium in charge of designing the
systems for processing the SKA data and for delivering them to the
astronomers GOAL: Facilitate astronomers launching their workflows
in heterogeneous DCIs
http://amiga.iaa.es/p/263-federated-computing.htm
Slide 8
Two-level workflow system ROTCURGALMODROTCURELLINTGALMODROTCUR
ELLINTGALMODROTCURELLINTGALMOD AMIGA4GAS web services as interface
of one or several GIPSY tasks GIPSY tasks orchestrated by COMPSs
Infrastructure level workflows Connecting AMIGA4GAS services to
build (Taverna) Wfs User level workflows
Slide 9
Infrastructure level workflows: Workflows as Web Services
ArchitectureUser Interface designGranularity Goals: Improve the
service performance & Make a good use of the infrastructures
COMPSs role Improve the User Experience UI design &
Granularity. GIPSY tasks have a large number of input parameters.
Handling all of them in a graphical workflow manager can be
tedious. Setting some of them with a default value would reduce the
amount of required inputs but would constrain the specific
operation that the service will perform. Compromise between
flexibility and usability Granularity levels ranging from: Services
with 1 GIPSY task to Services with 3 GIPSY tasks High granularity
Lower execution time: Task communication inside the web server
COMPSs orchestration capability
Slide 10
User level workflows: Taverna workflows
https://srv-prj-wsamiga.fcsc.es:8444/Amiga4GasServiceLadon/soap11/description
AMIGA4GAS services as blocks that the users can combine to build
their experiments
Slide 11
User level workflows: Taverna workflows The workflows have been
built using Taverna Workbench Astronomy Edition. Astrotaverna
plugin (AstroTaverna - Building workflows with Virtual Observatory
services. Ruiz, JE. Et al A&C 2014) : VO services for accessing
the data Tools for managing VOTables
Slide 12
User level workflows: Taverna workflows The workflows have been
published on myExperiment
Slide 13
User level workflows: Taverna workflows The DataLink (1)
service delivers related complementary metadata, including the
datacubes location. The SIAv2 (1) service queries a particular
archive, searching for datacubes inside a sky region Using VO
services to get the data location Specific formats to define the
data location: In clusters: @: In gLite grids: the Logical File
Name The AMIGA4GAS service is able to discover the computing
infrastructure where finally the GIPSY task will be executed.
Business rules and Federation Systems: A federated access to DCIs
for executing scientific workflows, P. Martin et al. demo presented
at EGI2015 (1) SIAv2 and Datalink protocols are in the state of
Proposed Recommendation in IVOA
Slide 14
Conclusions The astronomical community is paving the way to the
SKA : VO services for accessing astronomical data Innovative
solutions to the data transmission and reduction. But astronomers
need also innovative tools to support them to analyse science-ready
data in collaborative environments. We have developed a set of
advanced tools: the AMIGA4GAS services. They implement analysis
tasks of interest for the SKA use cases focused on galaxies. These
tools implement a two-level workflow system: Infrastructure level:
efficient exploitation of the DCIs (COMPSs) User level: Integration
in Taverna where the astronomers can combine them with VO
tools
WEB SERVICES Federated layer Business Rules Authentication
COMPSs JavaGAT OCCI connector Ibergrid adaptor SGE adaptor GRID
Super Computer / Cluster Clouds Authentication : LDAPs clusters
& Grid certificates & Cloud accounts Business Rules:
Scheduling among DCIs Latency, energy efficiency, time to complete,
probability of completion or costs COMPSs [1] : Job launcher
Infrastructure level workflows [1]
http://www.bsc.es/es/publications/comp-superscalar-bringing-grid-superscalar-and-gcm-togetherhttp://www.bsc.es/es/publications/comp-superscalar-bringing-grid-superscalar-and-gcm-together
Slide 17
Outcomes of the project Set of web services for analysing
interferometric multidimensional data running on heterogeneous DCIs
1) Specific (ready-to-use) outcomes How to schedule among DCIs:
Business rules Statistics of performance of the applications on
different infrastructures How to get astrophysics Software as a
Service running on DCI Installation and deployment issues Web
service design: efficiency and reusability Characterization of
analysis services IVOA International Virtual Observatory Alliance
PDL Parameter Description Language 2) General outcomes