V1: Data distribution, visualisation, and cloud computing
Both space-based experiments and seismology are facing the challenge of treating steadily increasing and complex data sets. The synergies between the François Arago Centre (FACe) within APC laboratory and both the Data Centre and the Data Analysis Centre (S-CAPAD) within IPGP, connected through a high speed network infrastructure, provides us with a unique data aware environment. It is also instrumental in terms of implementing new and innovative approaches for data integration and analysis for fully exploring the cornucopia of modern observations.
In the first two years, this project focused on harmonizing the usage of the data centres for the different projects in order to allow an optimal usage of the resources. In addition, the different aspects of the computing needs are investigated in view of their processing requirements. The outcome of this work is a work plan, which processes are processed locally, on the computing farm of the FACe, on the heavy-duty computing environment at CC-IN2P3, and what processes can best be performed using the GRID infrastructure or in the cloud.
At the end of this work task an efficient way will be provided in order to access the various resources, and a detailed advice will be given which resources are best used for the different tasks faced by IPGP Observatories, eLISA, LISA-Pathfinder, Euclid and other possible projects using the IPGP data centres and the FACe.
POSITION NAME SURNAME LABORATORY NAME GRADE, EMPLOYER WP leader Cécile CAVET APC IR2, CNRS/IN2P3 WP co-leader Volker BECKMANN IN2P3 IR1, CNRS/IN2P3 WP co-leader Nikolai SHAPIRO IPGP DR, CNRS WP member Michèle DETOURNAY APC IRHC, CNRS/IN2P3 WP member Constanza PARDO IPGP IR1, CNRS WP member Eleonore STUTZMANN IPGP PHY, CNAP WP member Jean-Marc COLLEY APC IR1, CNRS/IN2P3 WP member Jean-Pierre VILOTTE IPGP CNAP WP member Alexandre FOURNIER IPGP Professor WP member Geneviève MOGUILNY IPGP IR, CNRS
In terms of building a homogeneous data base using highly diverse (both in quality and quantity) data sets from seismological data centres, the team:
- designed and developed the necessary software to make geophysical data available through other data centers.
- provides data for the webservices access, available in several data centers, to retrieve the seismic data, allowing a fast access to the large data archive.
- We also developed algorithms for massive analysis of large continuous seismological datasets with using different types of computing architectures.
In the context of the investigations of the cloud environments with respect to other processing options, the main results can be summarized as follows:
- In general (for all type of scientific applications), a local cluster in “classical” setup mode performs as a virtual cluster installed on a cloud environment
- But processing which requires message-passing system can be of an order of magnitude faster on a dedicated cluster, because of the faster inter-processor communication and faster CPU-to-disk transfer
- compared to GRID computing, the cloud is easier to use because no middleware is necessary
- cloud computing enabled the IPGP, Integral, LISA-Pathfinder, LISA, and Euclid team to provide easy-to-use processing environments to their teams. The advantage of having exactly the same processing system (infrastructure agnostic), and thus being able to compare results more easily, outweighed the slightly reduced performance when compared to a local cluster environment
- federated cloud systems such as France Grilles FG-cloud are the logical next step in order to provide projects with easy access to large computing power without generating large costs.
- container technologies such as Docker allows to easily share code and reach to a production level on multi-infrastructures (local, grid, cloud, and cluster with the Singularity container solution).
- We have to continue to investigate new computing infrastructures. The concentration of knowledge about the best computing architectures has shifted from the scientific to the private sector over the last ~10 years. It is vital that scientific projects continue or get involved in state-of-the-art computing, in order to get the highest scientific return possible for the invested budget.
Therefore, in order to go a step further, we would like to significantly improve two major aspects:
- Interfaces to cloud and distributed computing such as SlipStream® and Mesos
- Offer images of various applications with a container solution depending on the use case (services, HTC, HPC)
- Provide production environment where job are containers