V1: Data distribution, visualisation, and cloud computing
Both space-based experiments and seismology are facing the challenge of treating steadily increasing and complex data sets. The synergies between the François Arago Centre (FACe) within APC laboratory and both the Data Centre and the Data Analysis Centre (S-CAPAD) within IPGP, connected through a high speed network infrastructure, provides us with a unique data aware environment. It is also instrumental in terms of implementing new and innovative approaches for data integration and analysis for fully exploring the cornucopia of modern observations.
In the first two years, this project focused on harmonizing the usage of the data centres for the different projects in order to allow an optimal usage of the resources. In addition, the different aspects of the computing needs are investigated in view of their processing requirements. The outcome of this work is a work plan, which processes are processed locally, on the computing farm of the FACe, on the heavy-duty computing environment at CC-IN2P3, and what processes can best be performed using the GRID infrastructure or in the cloud.
At the end of this work task an efficient way will be provided in order to access the various resources, and a detailed advice will be given which resources are best used for the different tasks faced by IPGP Observatories, eLISA, LISA-Pathfinder, Euclid and other possible projects using the IPGP data centres and the FACe.
POSITION NAME SURNAME LABORATORY NAME GRADE, EMPLOYER WP leader Cécile CAVET APC IR2, CNRS/IN2P3 WP co-leader Volker BECKMANN IN2P3 IR1, CNRS/IN2P3 WP co-leader Nikolai SHAPIRO IPGP DR, CNRS WP member Michèle DETOURNAY APC IRHC, CNRS/IN2P3 WP member Constanza PARDO IPGP IR1, CNRS WP member Eleonore STUTZMANN IPGP PHY, CNAP WP member Jean-Marc COLLEY APC IR1, CNRS/IN2P3 WP member Jean-Pierre VILOTTE IPGP CNAP WP member Alexandre FOURNIER IPGP Professor WP member Geneviève MOGUILNY IPGP IR, CNRS
In terms of building a homogeneous data base using highly diverse (both in quality and quantity) data sets from seismological data centres, the team:
– designed and developed the necessary software to make geophysical data available through other data centers.
– provides data for the webservices access, available in several data centers, to retrieve the seismic data, allowing a fast access to the large data archive.
– We also developed algorithms for massive analysis of large continuous seismological datasets with using different types of computing architectures.
In the context of the investigations of the cloud environments with respect to other processing options, the main results can be summarized as follows:
– In general (for all type of scientific applications), a local cluster in “classical” setup mode performs as a virtual cluster installed on a cloud environment. But processing which requires message-passing system can be of an order of magnitude faster on a dedicated cluster, because of the faster inter-processor communication and faster CPU-to-disk transfer
– compared to GRID computing, the cloud is easier to use because no middleware is necessary
– cloud computing enabled the IPGP, Integral, LISA-Pathfinder, LISA, SVOM and Euclid team to provide easyto- use processing environments to their teams. The advantage of having exactly the same processing system (infrastructure agnostic), and thus being able to compare results more easily, outweighed the slightly reduced performance when compared to a local cluster environment
– federated cloud systems such as France Grilles FG-cloud are the logical next step in order to provide projects with easy access to large computing power without generating large costs.
– container technologies such as Docker and Singularity in conjunction with continuous integration tools (GitLab-CI) allows to easily share code and reach to a production level on multi-infrastructures (local, grid, cloud, and cluster).
– the next step is the management of containers with container orchestrators such as Kubernetes (k8s) that can replace classic job scheduler (Slurm, Grid Engine…) in order to execute container jobs on batch cluster.
– we have to continue to investigate new computing infrastructures. The concentration of knowledge about the best computing architectures has shifted from the scientific to the private sector over the last ~10 years.It is vital that scientific projects continue or get involved in state-of-the-art computing, in order to get the highest scientific return possible for the invested budget.
– the next paradigm is IA with Machine Learning/Deep Learning which can be easily implemented with the Python TensorFlow library which run on CPU, GPU and the new TPU (Google Tensor Processing Unit) as explored in the DecaLog/ComputeOps IN2P3 project.
The current year of the LabEx UnivEarthS Valorisation project was dedicated to dissemination of the results and knowledge of this work package, and the use of container technologies within the IN2P3 DecaLog project: In October 2018, we have participated in the organization of JCAD 20181 (Journées SUCCES + mésocentre). This meeting has the goal to federate scientific user and infrastructure administrators of the France Grilles community and the connected infrastructures.
In 2018, the ComputeOps (DecaLog master project2) has been accepted by the IN2P3 in order to study container for high performance computing. In this context, which is strongly connected with the LabEx WP V1 topics, the project has organized the IN2P3 informatics school on container in production. Compositions of containers, continuous integration and deployment of containers, and container orchestrators have been explored during the school.
Furthermore, the ComputeOps project has started to provide tools (container Hub, CI recipes), good practices and tutorials on containers for this specific field. A workshop will be organized in November in order to present the new version of the chosen container solution3 and other topics studied in the ComputeOps project.
In September 2017, the FACe and IPGP have received a positive answer concerning a Sesame proposal (regional call) for the MULTI DATA ANALYSIS AND COMPUTING ENVIRONMENT FOR SCIENCE (DANTE) project, which will reinforce the synergy of the two laboratories as computing and service providers. During 2018, we made the first meeting to discuss the new DANTE scientific instrument organization. Due to the FACe moving from BioPark to Condorcet, the FACe computing cluster has been temporary moved in the LPNHE laboratory. During 2019, the cluster will be moved and upgraded into the IPGP computing room. Both platforms, S-CAPD@IPGP and FACe’s cluster, will be part of the CIRRUS platform (USPC COMUE).
Further work has been done on the cloud infrastructure that can be used at the APC. Documentation about the usage of Cloud has been finalized and put together in a set of practical user documentation. This documentation has been made public through the Wiki pages at the APC and through the Atrium document data base. The Docker container technology has been used for space missions such as Euclid, LISA and SVOM. Indeed, several applications (code sharing of the simulator, services providing such as Jupyter Notebook, Django Web application) has been developed for this specific use case.