CS3MESH4EOSC – Making data sharing in Europe child’s play: an interview with the coordinator Jakub Moscicki

 

Interview

Effective exploitation of data is key to scientific and economic progress. It is largely determined by the ability of users to share and collaborate on datasets as well as making a useful connection between the datasets and other services such as, for example, computing services for data analysis or digital repositories for data classification and preservation. 

The cloud storage services provided by academic and research institutions have become an indispensable element of the daily workflow routine, allowing research groups, scientists and engineers to share, transfer and synchronise data in simple but powerful ways. Unfortunately these services  are largely disconnected and deployed in isolation from one another and from other research services. That's in spite of the fact that in recent years new technologies have been developed and integrated to further increase the value of data.

The EU-funded CS3MESH4EOSC is connecting locally and individually provided sync and share services, and scaling them up to the European level and beyond. In order to do this, CS3MESH4EOSC is delivering the Science Mesh service, an interoperable platform to easily sync & share data,  deploy applications and software components, and extend functionalities. 

The Science Mesh will also serve the European Open Science Cloud (EOSC) with a built-in sustainability model using  on-premises service delivery by utilizing existing key technology enablers. It will empower service providers in delivering state-of-the-art services and connect infrastructure to boost effective scientific collaboration across the entire federation and data sharing according to FAIR principles.

Jakub Moscicki is the CS3MESH4EOSC project coordinator and Deputy Group Leader for Storage at CERN, the European Organization for Nuclear Research, one of the world's largest and most respected centres for scientific research. Jakub explains the added value of the project and how it fits into the EOSC vision, as well as how the Science Mesh will go beyond storage services provided by some of the biggest cloud companies in the world, amongst other topics. The full interview is available below.

CS3MESH4EOSC is the name of the project. Where does it come from and what’s its added value?

Naming things is one of the hardest things in software engineering. And as we are originally a bunch of engineers we have definitely proven the point with the project name. The project wants to bring the successful bottom-up experience of the Cloud Storage Services for Synchronization and Sharing (CS3) community and to offer this as our building brick to help shape the future of European cloud for science (EOSC).

CS3 is about Cloud Service for Synchronisation and Sharing, a community that has formed around a series of workshops touring Europe since 2014 (visit the website https://www.cs3community.org). This is a community of doers: over 70 institutions providing on-premises services in academia, education and research in a close collaboration with the European software industry and directly responding to the needs of nearly half a million scientists, scholars, students and researchers. CS3 goes beyond Europe and includes organisations from the USA, Asia and Australia.

The CS3MESH4EOSC project serves the CS3 community by supporting previously existing initiatives such as Open Cloud Mesh or CS3 Application Programming Interfaces (APIs) as well as incubating new ideas and concepts such as integration of Sync&Share services with data science environments, collaboration and productivity applications, data-research lifecycles, digital repositories and large data transfers.

Your main service is called “Science Mesh”. What is it and what are its main benefits compared to other solutions available on the market? 

Science Mesh is the actual federated infrastructure which gives users of a cloud service in their institution the possibility to securely share and collaborate on data with their peers at other institutions. At present this is only possible in large, centralised, commercial clouds, which, as a prerequisite, require all data to be hosted or exported to these clouds. This creates problematic lockdown situations for public institutions but it is quite attractive to the end-users, especially because of the integrated user experience commercial clouds can offer.

Science Mesh will allow the best of both worlds: users will not need to leave their well-known interface of their domestic, institutional service in order to be able to efficiently collaborate with users in other institutions. Better still, Science Mesh will be able to give access to functionalities which are unique and may be easily customized to the needs of particular research groups or disciplines. That's why Science Mesh will leverage the fully Open Source development model in close collaboration with the Open-Source software industry in Europe and beyond.

The Science Mesh will be fully integrated with the EOSC, while ensuring alignment with FAIR data management principles. Why is this important and how will Science Mesh contribute to the EOSC success?

We'll see the details but the plan is to bring the experience of a very lively, bottom-up community to propose additional dynamic to EOSC. The services operated by the CS3 community are already self-sustainable and used by literally hundreds of thousands of users in tens of institutions across the continent. By its very core nature, Sync&Share services are about storing, organizing and sharing files so plugging in FAIR data management mechanism feels very natural. This mechanism will not only allow the users of Sync&Share services to better classify and describe their data but it will also provide a connection to the Digital Repositories, including Open Data (find out more about how CS3MESH4EOSC is contributing to EOSC and FAIR).

One of your goals is to go beyond the general-purpose storage services provided by Google, Dropbox, Amazon and Microsoft, all big non-EU tech companies. How will Science Mesh be able do that?

We believe in the power of Open Source and in the strong realisation in the community that the current model is a dead-end for many independent actors. That's for a variety of reasons: cost at scale, vendor lock-in, lack of open governance, irrecoverable loss of tech skills, questions around data privacy and a threat of leaking intellectual property (IP) and so on. Science Mesh will enable new functionalities to be integrated and it will make it very easy to contribute these functionalities. We will provide tools, methodology, access to the community of experts, existing software repositories, examples and documented standards to make adding functionality as easy, and as reusable, as possible. This may be done directly using own effort by an interested party or via a hired effort, pulling resources together from several institutions in cases of common interest. Doing it in collaboration with established software providers will further increase the impact of these solutions and the benefit to the community as a whole.

One of the main challenges of EU projects is to transfer the technology to the industrial market. How do you plan to engage Science Mesh with potential vendors already in its development phase?

Collaboration with the software vendors is the key for sustainability. We’re already using vendor software to implement our services, the vendors are active participants of the CS3 community, and we aim to feed back the code, the APIs, and involve the vendors in the standards-making process. Science Mesh isn’t setting up a new, standalone codebase in isolation. We believe that by integrating Science Mesh the vendors will be able to enrich their offer for the education and research institutions and that this will generate additional business for them.

CS3MESH4EOSC celebrates its 1st anniversary in Jan 2021. What were the main achievements of the project during the first 12 months of activity? What lies ahead for 2021?

We have laid out very solid technical foundations for the Science Mesh in spite of a difficult start due to lockdown and associated hiring problems. For example, the core of our interoperability platforms (IOP) is now deployed in all partners' sites and it's ready to be connected with the Sync&Share platforms. We have also developed a flexible invitation workflow which may be combined with established identity mechanisms such as EduGain, as a basis for secure, trusted and auditable federated sharing for end-users. We have advanced on all application tasks as well with Jupyter Lab for Data Science Environments, RO-Crate for FAIR data management support, WOPI-based collaborative editing and so on (find out more about how CS3MESH4EOSC will take advantage of existing key technologies).

In 2021 we need to make a step from the current testbed to production, finalise integration of Sync&Share platforms in collaboration with the software vendors and start iterating on the further evolution of the system directly with the end-users and with the wider CS3 community. As we are currently engaging with the new sites beyond the project consortium, we are not forgetting about policy aspects, rules of participation for the future Science Mesh and integration in the wider EOSC ecosystem. We plan to discuss the next steps with the entire community at the Science Mesh Workshop at the CS3 2021 annual conference in January.

Subscribe to our newsletter to be the first to hear the latest CS3MESH4EOSC updates

Watch the new CS3MESH4EOSC video to find out more about how we are making data sharing easy