V1: Data distribution, visualisation, and cloud computing

> Read the articles connected to the project.

Making the most out of scientific effort: the valorisation project

Both space-based experiments and seismology are facing the challenge of treating steadily increasing and complex data sets. The synergies between the François Arago Centre (FACe) within APC laboratory and both the Data Centre and the Data Analysis Centre (S-CAPAD) within IPGP, connected through a high speed network infrastructure, provides us with a unique data aware environment. It is also instrumental in terms of implementing new and innovative approaches for data integration and analysis for fully exploring the cornucopia of modern observations.

In the first two years, this project focused on harmonizing the usage of the data centres for the different projects in order to allow an optimal usage of the resources. In addition, the different aspects of the computing needs are investigated in view of their processing requirements. The outcome of this work is a work plan, which processes are processed locally, on the computing farm of the FACe, on the heavy-duty computing environment at CC-IN2P3, and what processes can best be performed using the GRID infrastructure or in the cloud.

At the end of this work task an efficient way will be provided in order to access the various resources, and a detailed advice will be given which resources are best used for the different tasks faced by IPGP Observatories, eLISA, LISA-Pathfinder, Euclid and other possible projects using the IPGP data centres and the FACe.

POSITION	NAME SURNAME	LABORATORY NAME	GRADE, EMPLOYER
WP leader	Cécile CAVET	APC	IR2, CNRS/IN2P3
WP co-leader	Volker BECKMANN	IN2P3	IR1, CNRS/IN2P3
WP co-leader	Nikolai SHAPIRO	IPGP	DR, CNRS
WP member	Michèle DETOURNAY	APC	IRHC, CNRS/IN2P3
WP member	Constanza PARDO	IPGP	IR1, CNRS
WP member	Eleonore STUTZMANN	IPGP	PHY, CNAP
WP member	Jean-Marc COLLEY	APC	IR1, CNRS/IN2P3
WP member	Jean-Pierre VILOTTE	IPGP	CNAP
WP member	Alexandre FOURNIER	IPGP	Professor
WP member	Geneviève MOGUILNY	IPGP	IR, CNRS

In terms of building a homogeneous data base using highly diverse (both in quality and quantity) data sets from seismological data centres, the team:
– designed and developed the necessary software to make geophysical data available through other data centers.
– provides data for the webservices access, available in several data centers, to retrieve the seismic data, allowing a fast access to the large data archive.
– We also developed algorithms for massive analysis of large continuous seismological datasets with using different types of computing architectures.

In the context of the investigations of the cloud environments with respect to other processing options, the main results can be summarized as follows:

– In general (for all type of scientific applications), a local cluster in “classical” setup mode performs as a virtual cluster installed on a cloud environment. But processing which requires message-passing system can be of an order of magnitude faster on a dedicated cluster, because of the faster inter-processor communication and faster CPU-to-disk transfer
– compared to GRID computing, the cloud is easier to use because no middleware is necessary
– cloud computing enabled the IPGP, Integral, LISA-Pathfinder, LISA, SVOM and Euclid team to provide easyto- use processing environments to their teams. The advantage of having exactly the same processing system (infrastructure agnostic), and thus being able to compare results more easily, outweighed the slightly reduced performance when compared to a local cluster environment
– federated cloud systems such as France Grilles FG-cloud are the logical next step in order to provide projects with easy access to large computing power without generating large costs.
– container technologies such as Docker and Singularity in conjunction with continuous integration tools (GitLab-CI) allows to easily share code and reach to a production level on multi-infrastructures (local, grid, cloud, and cluster).
– the next step is the management of containers with container orchestrators such as Kubernetes (k8s) that can replace classic job scheduler (Slurm, Grid Engine…) in order to execute container jobs on batch cluster.
– we have to continue to investigate new computing infrastructures. The concentration of knowledge about the best computing architectures has shifted from the scientific to the private sector over the last ~10 years.It is vital that scientific projects continue or get involved in state-of-the-art computing, in order to get the highest scientific return possible for the invested budget.
– the next paradigm is IA with Machine Learning/Deep Learning which can be easily implemented with the Python TensorFlow library which run on CPU, GPU and the new TPU (Google Tensor Processing Unit) as explored in the DecaLog/ComputeOps IN2P3 project.

The current year of the LabEx UnivEarthS Valorisation project was dedicated to dissemination of the results and knowledge of this work package, and the use of container technologies within the IN2P3 DecaLog project: In October 2018, we have participated in the organization of JCAD 20181 (Journées SUCCES + mésocentre). This meeting has the goal to federate scientific user and infrastructure administrators of the France Grilles community and the connected infrastructures.

In 2018, the ComputeOps (DecaLog master project2) has been accepted by the IN2P3 in order to study container for high performance computing. In this context, which is strongly connected with the LabEx WP V1 topics, the project has organized the IN2P3 informatics school on container in production. Compositions of containers, continuous integration and deployment of containers, and container orchestrators have been explored during the school.

Furthermore, the ComputeOps project has started to provide tools (container Hub, CI recipes), good practices and tutorials on containers for this specific field. A workshop will be organized in November in order to present the new version of the chosen container solution3 and other topics studied in the ComputeOps project.

In September 2017, the FACe and IPGP have received a positive answer concerning a Sesame proposal (regional call) for the MULTI DATA ANALYSIS AND COMPUTING ENVIRONMENT FOR SCIENCE (DANTE) project, which will reinforce the synergy of the two laboratories as computing and service providers. During 2018, we made the first meeting to discuss the new DANTE scientific instrument organization. Due to the FACe moving from BioPark to Condorcet, the FACe computing cluster has been temporary moved in the LPNHE laboratory. During 2019, the cluster will be moved and upgraded into the IPGP computing room. Both platforms, S-CAPD@IPGP and FACe’s cluster, will be part of the CIRRUS platform (USPC COMUE).

Further work has been done on the cloud infrastructure that can be used at the APC. Documentation about the usage of Cloud has been finalized and put together in a set of practical user documentation. This documentation has been made public through the Wiki pages at the APC and through the Atrium document data base. The Docker container technology has been used for space missions such as Euclid, LISA and SVOM. Indeed, several applications (code sharing of the simulator, services providing such as Jupyter Notebook, Django Web application) has been developed for this specific use case.
Publications:

2018:

C. Cavet, A. Petiteau, M. Le Jeune ,
Prototyping for the Distributed Data Processing Center of LISA
12th International LISA Symposium (2018) In prep.

2017:

Cavet, A. Petiteau, M. Le Jeune, E. Plagnol, E. Marin-Martholaz, J-B. Bayle, A proto-Data Processing Center for LISA, 11th International LISA Symposium, Journal of Physics : Conference Series, Volume 840, conference 1 (2017): http://iopscience.iop.org/article/10.1088/1742-6596/840/1/012045

P. Amaro-Seoane et al.
LISA mission proposal
arXiv:1702.00786 (2017)

Cavet, V. Legoll, J. Pansanel, S. Pop, A. Ramparison, G. Romier, F. Thiebolt, FG-Cloud : un service de cloud computing fédéré pour le calcul scientifique, JRES 2017 (2017)

2016:

M. Poncet, T. Faure, C. Cavet, A. Petiteau, P.-M. Brunet, E. Keryell-Even, S.
Gadioux, M. Burgaud
Enabling collaboration between space agencies using private and cloud based clusters
BiDS’16 (2016) http://hal.archives-ouvertes.fr/hal

2015:

M. Airaj, C. Biscarat, C. Cavet, N. Clémentin, S. Geiger, C. Gondrand, V. Hamar,
M. Jouvin, V. Legoll, S. Li, C. Loomis, M. Marquillie, G. Mathieu, J. Pansanel, G.
Philippon, J.-M. Pierson, M. Puel, G. Romier, F. Thiebolt, A. Tsaregorodtsev
FG-Cloud : Cloud communautaire distribué à vocation scientifique
JRes, Montpellier (2015)
http://hal.in2p3.fr/in2p3-01285123

2014:

Scientific Data Preservation 2014, publication CNRS

2013:

M. Airaj, C. Cavet, V. Hamar, M. Jouvin, C. Loomis, A. Lopez Garcia, G. Mathieu, V. Mendez, J. Pansanel, J.-M. Pierson, M. Puel, F. Thiebolt, A. Tsaregorodtsev,
« Vers une fédération de Cloud académique dans France Grilles«
Journées SUCCES 2013, Paris : France, hal-00927506 (2013)

Lemarchand A., Tait S., Beauducel F., Bouin M.P., Brenguier F., de Chabalier J. B., Clouard V., Di Muro A., Ferrazzini V., Shapiro N., and the IPGP observatories’ teams,
“Significant breakthroughs in monitoring networks of the volcanological and seismological French observatories”,
American Geophysical Union Fall Meeting, San Francisco, California, 2013

2012:

C. Cavet, M. Le Jeune, F. Dodu, M. Detournay
Utilisation du Cloud StratusLab : tests de performance des clusters virtuels,
Journées scientifiques mésocentres et France Grilles 2012, Paris : France, hal-00766067 (2012).
http://hal.archives-ouvertes.fr/hal-00766067

Bonaime S., Stutzmann E., Maggi A., Vallée M., Pardo C., and the GEOSCOPE group,
« The GEOSCOPE network »,
AGU, fall meeting, 2012
Stutzmann E., Maggi A., Bonaime S., Pardo C.,
“30th Anniversary of the GEOSCOPE”,
American Geophysical Union Fall Meeting, San Francisco, California, 2012

Communication

1. Seminaries
- Seminar: Conteneurs (Docker, Singularity) pour le HPC, Activités et vision pour le domaine HTC / HPC, Siège social du CNRS, Paris, 22 Février, 2017: https://indico.in2p3.fr/event/14008/session/2/contribution/6/material/slides/0.pdf
- Workshop organisation: 2nd interdisciplinary workshop on time series analysis, Université Paris Descartes, Paris, 12-13 Décembre, 2016:
  https://indico.in2p3.fr/event/13934/
- Conteneurs (Docker, Singularity) pour le HPC, Activités et vision pour le domaine HTC / HPC, Siège social du CNRS, Paris, 22 Février, 2017
  https://indico.in2p3.fr/event/14008/session/2/contribution/6/material/slides/0.pdf
- Workshop organisation: 2nd interdisciplinary workshop on time series analysis, Université Paris Descartes, Paris, 12-13 Décembre, 2016:
  https://indico.in2p3.fr/event/13934/
- Webinaire Docker : retour d’expérience, Webinaire du RI3, 16 Juin 2016.
  https://indico.in2p3.fr/event/13287/material/slides/1.pdf
- Cloud computing : a new computing infrastructure for scientific applications, Campus Paris Diderot, Paris, 2 Décembre 2013.
  http://www.apc.univ-paris7.fr/~beckmann/common/pres_big_computing_13.pdf
- Cloud computing : a new computing infrastructure for scientific applications, Laboratoire APC, Paris, 10 Décembre 2012.
2. Orals
- Un cas d’étude en astrophysique, Journée de sensibilisation aux moyens mutualisés d’accès au calcul intensif, INRA, Paris, 11 Janvier 2016.
  http://cascisdi.inra.fr/sites/cascisdi.inra.fr/files/journeeCalcul_11janv2016_
  cloud_Cavet.pdf
- Présentation du cloud, Journée de sensibilisation aux moyens mutualisés d’accès au calcul intensif, INRA, Paris, 11 Janvier 2016.
  http://cascisdi.inra.fr/sites/cascisdi.inra.fr/files/journeeCalcul_11janv2016_
  astro_Cavet.pdf
- Review on distributed computing,Workshop distributed computing in astrophysics, FACe, APC, Paris, 10 – 11 Décembre 2015.
  https://indico.in2p3.fr/event/12042/contribution/1/material/slides/0.pdf
- Etude des ondes gravitationnelles : de l’espace au cloud, Journées SUCCES, IPGP, Paris, 5 – 6 Novembre 2015.
  http://succes2015.sciencesconf.org/conference/succes2015/C_cavet.pdf
- Cloud technology for algorithm preservation, Atelier PREDONx, Laboratoire APC, Paris, 5 – 6 Novembre 2014.
  https://indico.cern.ch/event/338461/session/3/contribution/5/material/slides/0.pdf
- Retour d’expérience en Astrophysique : utilisation du Cloud IaaS pour le traitement de données des missions spatiales, École informatique IN2P3 2014 : Maîtriser le Cloud, Centre Jean Bosco / Centre de calcul de l’IN2P3, Lyon, 1 – 5 Juillet 2014.
  https://indico.in2p3.fr/getFile.py/access?contribId=37&sessionId=22&resId=0&
  materialId=slides&confId=9852
  https://hal.archives-ouvertes.fr/hal-01132587
- Big Data : utilisation d’un cluster Hadoop, Atelier Big Data du LabEx UnivEarths, FACe, Paris, 14 Janvier 2014.
  http://www.apc.univ-paris7.fr/~beckmann/common/Cavet_BigData_01_14.pdf
- Tests de SlipStream au LAL et au CC-IN2P3 : vers la fédération du Cloud Computing, Rencontres France-Grilles – LCG-France, CC-IN2P3, Lyon, 26 – 28 Novembre 2013.
  https://indico.in2p3.fr/getFile.py/access?contribId=20&sessionId=3&resId=0&
  materialId=slides&confId=8867
  https://hal.archives-ouvertes.fr/hal-01132540
- Vers une fédération de Cloud académique dans France Grilles, Journées SUCCES, IPGP, Paris, 13 – 14 Novembre 2013.
  http://succes2013.sciencesconf.org/conference/succes2013/FG_Cloud_20131112.pdf
- Flexible Data Processing Solutions for Space Missions, SCIOPS, ESAC, Madrid, Espagne, 10 – 13 Septembre 2013.
  http://www.rssd.esa.int/SYS/CONF2013/include/SCIOPS2013/docs/presentations/
  20130912-0930-Beckmann_Flexible_SDC_SciOps2013.pdf
- Utilisation de StratusLab dans le cadre des applications astroparticules à l’APC, Rencontre LCG-France, LLR, Palaiseau, 28 Mai 2013.
  https://indico.in2p3.fr/getFile.py/access?contribId=44&sessionId=5&resId=0&
  materialId=slides&confId=8140
  https://hal.archives-ouvertes.fr/hal-01132552
- Retour d’expérience d’utilisation d’un Cloud en Astrophysique : le projet BOSS, Journées Clouds pour le Calcul Scientifique, LAL, Orsay, 27 Novembre 2012.
  http://indico2.lal.in2p3.fr/indico/getFile.py/access?contribId=8&sessionId=0&
  resId=0&materialId=slides&confId=1897
- Utilisation du Cloud Computing de type IaaS («Infrastructure-as-a-Service) : tests de clusters virtuels dans le cadre d’applications astroparticules, 8ème Journées Informatique de l’IN2P3 – IRFU, La Londe Les Maures, 22 – 25 Octobre 2012.
  https://indico.in2p3.fr/getFile.py/access?contribId=4&sessionId=13&resId=0&
  materialId=slides&confId=6514
3. Posters
- Docker for space missions, Journées nationales du Développement Logiciel, Marseille, 4 – 7 Juillet 2017
  http://devlog.cnrs.fr/_media/jdev2017/poster_jdev2017_dockerspatial_cecile_
  cavet.pdf?id=jdev2017%3Aposters&cache=cache
- Docker for space missions, EGI Conference and INDIGO Summit 2017, Catane, Italie, 9 – 12 Mai 2017
  https://indico.egi.eu/indico/event/3249/contribution/0/material/poster/0.pdf
- A proto-data processing centre, 11th International LISA Symposium, Irchel Campus of University
  of Zurich, Zurich, Suisse, 5 – 9 Septembre, 2016http://www.physik.uzh.ch/events/lisa2016/uploads/082/poster_lisa_16.pdf
- Hadoop on the Cloud : the SlipStream deployment tool, Journées nationales du Développement Logiciel, INP – ENSEIRB-MATMECA, Bordeaux, 30 Juin – 3 Juillet 2015.
  http://devlog.cnrs.fr/_media/jdev2015/poster_jdev15_hadooponcloud_cecile_cavet.
  pdf?id=jdev2015%3Aposters&cache=cache
- Hadoop on the Cloud : the SlipStream deployment tool, EGI Conference, Lisbon, Portugal, 18 – 22 May 2015.
  http://indico.egi.eu/indico/contributionDisplay.py?contribId=0&confId=2443
- Cloud computing for Astroscience applications, École d’automne du Labex UnivEarthS, Villa Finaly, Florence, Italie, 27 – 31 Octobre 2014.
  https://hal.archives-ouvertes.fr/hal-01132523
4. Tutorials
- Interfaces PaaS, Formation Utilisateur FG-Cloud, CC-IN2P3, Lyon, 27 – 29 Avril 2016.
  https://indico.in2p3.fr/event/12720/session/8/contribution/14/material/slides/0.
  pdf
- Hadoop hands-on : using MapReduce / Spark on the cloud, Workshop distributed computing in astrophysics, FACe, APC, Paris, 10 – 11 Décembre 2015.
  https://indico.in2p3.fr/event/12042/contribution/7/material/slides/0.pdf
- SlipStream : un outil de déploiement automatique pour le cloud fédéré France Grilles, Démonstration, Journées SUCCES, IPGP, Paris, 5 – 6 Novembre 2015.
  http://webcast.in2p3.fr/videos-demonstration_slipstream
- TP développeurs : introduction et présentation, École informatique IN2P3 2014 : Maîtriser le Cloud, Centre Jean Bosco / Centre de calcul de l’IN2P3, Lyon, 1 – 5 Juillet 2014.
  https://indico.in2p3.fr/contributionDisplay.py?sessionId=23&contribId=9&confId=
  9852
- Hands-on tutorial on StratusLab Cloud, Laboratoire APC, Paris, 31 Mai 2013.
  http://www.apc.univ-paris7.fr/FACe/content/tutoriel-cloud-sur-stratuslab
Other activities

Valorization
- Lettre informatique de l’IN2P3, n34, Juillet 2016.
  http://informatique.in2p3.fr/li/spip.php?article440
- Interview France Grilles chercheur-ingénieur
  http://idgc.in2p3.fr/fr/e-toiles/cecile-cavet-et-antoine-petiteau/
- Lettre informatique de l’IN2P3, n32, Novembre 2015.
  http://informatique.in2p3.fr/li/spip.php?article405
- Wiki du Centre François Arago (FACe) sur le Cloud computing.
  https://www.apc.univ-paris7.fr/FACeWiki/pmwiki.php?n=Cloud.Cloud
- LabEx UnivEarths (WP V1) : Diffusion des données, visualisation et nuage informatique.
  http://www.univearths.fr/fr/projets-du-labex-univearths/projet-valorisation/v1-diffusion-des-donnees-visualisation-et-nuage-informatique/
Diffusion
- PREDON book : Scientific Data Preservation, document de synthèse, 2015.
- PREDON book : Scientific Data Preservation, document de synthèse, 2014.
  https://martwiki.in2p3.fr/twiki/pub/PREDON/WebHome/PREDON-VECTO-BD.pdf
  http://hal.in2p3.fr/in2p3-00959072
Projects
- MLDC-webapp : application Web basée sur le framework Django pour le Mock LISA Data challenge (code source et documentation sur dépôt Git : https ://gitlab.in2p3.fr/elisadpc/elisadpctools)
- Pyraeus : outils en Python pour la reconstruction des décalages spectraux photométriques des Galaxies (code source sur dépôt Git : https ://gitlab.in2p3.fr/photoz/photoz ; documentation en ligne : http ://www.apc.univ-paris7.fr/ lejeune/pyraeus/html/index.html).

V1: Data distribution, visualisation, and cloud computing

Making the most out of scientific effort: the valorisation project

Publications:

2018:

2017:

2016:

2015:

2014:

2013:

2012:

Communication

1. Seminaries

2. Orals

3. Posters

4. Tutorials

Other activities

Valorization

Diffusion

Projects