- 17 January 2022
Data Science Environments - What is it and what are the main benefits? – 2nd Podcast with Marcin Sieprawski from CS3MESH4EOSC and Software Mind.
CS3MESH4EOSC is kicking off its 2nd Podcast episode: “Data Science Environment - What is it and what are the main benefits?” This episode is focused on the Data Science Environments, a data service that will be integrated in the Mesh the main assets of the CS3MESH4EOSC project.
Data Science Environments
This specific Data Service is all about the integration of data science environments into the federated Science Mesh, in order to facilitate collaborative research and enable cross-federation sharing of computational tools, algorithms and resources.
But what is the main functionality? Data Science Environments is accessible via the web interface at the remote sites of researchers to enable them to work on algorithms and data processing programs interactively.
The objective is that the users will be able to access remote execution environments to replay (and modify) analysis algorithms without the need to set up upfront accounts in the remote system.
The functional integration with EFSS (Enterprise File Sync and Share) such as:
- Interactive features: advance from current JupyterHub to JupyterLab with collaborative notebook editing, explore interactive widgets such as those provided by QuantStack Voila, etc.
- Jupyter native - interfaces for OCM sharing
- Connection to code repositories such as Git-based or CVMFS-based and lightweight runtime environments similar to mybinder.org
- Interface to computational resources (such as BigData Spark, HPC, batch and Grid clusters).
You can listen to the podcast here below
Stay always up to date:
What Is the Science Mesh?
The motivation behind the creation of the Science Mesh was to provide researchers, educators, data curators and analysts with the ability to control and share data and datasets remotely, across borders in a secure and easy way. The platform interconnects nodes from different European countries to create a bigger platform where users can recombine their data with others. From a cost-effectiveness point of view, the Science Mesh is attractive to operators; without it, each site would be responsible for the development of their own science-facing capabilities.
The Science Mesh levels this playing field and allows operators to reuse locally a science-facing capability that has been developed elsewhere in the Science Mesh. Another important point is that the Science Mesh is a horizontal infrastructure, that is, it can be used regardless of the scientific domain of the user (e.g. social sciences, earth observation). Its base offering is generic enough that it is useful to a broad spectrum of worldwide science communities; specific domain relevance is attained through the development of science-facing plugins. Science Mesh aims to bring all science data and science identities together. The time to work in “silos” and isolated from each other should be over.
Science Mesh next steps – Applications integration
CS3MESH4EOSC has the ambition of involving groups beyond the core consortium partners in co-design and co-development of the service. Still being in its development phase, the Science Mesh is integrating services from third parties into the mesh, to enlarge the platform in different kinds of services: the one explained now, the Open Data Systems, Data Transfer and Collaborative Editing. Follow the CS3MESH4EOSC to receive updates about the latest developments.