The CS3-based technologies in the ScienceMesh: Interview with Hugo Gonzalez Labrador (CERN)



CS3MESH4EOSC provides a Pan-European natively FAIR and GDPR-compliant data storage and sharing fabric.
Science Mesh is the main asset of the project and provides an interoperable platform to easily sync & share, and deploy applications and software components within the full CS3 community to extend the functionalities of the service. 

Science Mesh enables researchers, educators, data curators and analysts to retain control over their remote or domestic datasets while becoming FAIR compatible and integrated with the European Open Science Cloud (EOSC) at the same time. This will offer researchers opportunities to assemble an efficient, reliable, collaborative and transparent research toolchain. The target stakeholders will be able to directly access the service provided by Science Mesh from easy-to-use interfaces and discover the different functionalities. 

We had a chat with Hugo Gonzalez, Technical Work Package Leader in CS3MESh4EOSC, to understand better his role in the project, the chosen technology the consortium is working on and the improvements planned for the best result.

Q: Tell me about you and What is your role in the ScienceMesh?

My name is Hugo Gonzalez, and I'm a software engineer working at CERN IT department where I am responsible for the CERNBox service (a petabyte-scale cloud storage collaboration platform for CERN). My role in ScienceMesh is to lead the technical work package (WP3), together with an international team of around 8 people where we develop the software foundation for the European federated mesh.

Q: What are the OCM, REVA and CS3 APIs, known as CS3-based technologies? Can you explain these technologies?

CS3APIs is a set of APIs that anyone can use to build fabric components around the work we do in ScienceMesh. Notably, with CS3APIs you can connect your application to the Reva middleware, and your users will be able to benefit from advanced use-cases: like massive data transfers or interactive data analysis with Jupyter notebooks. OCM is the OpenCloudMesh protocol, a set of specifications that enable different vendors to share data between them, which in turn allow scientists storing data into EFSS platforms to easily collaborate among them.

Q: Why were these technologies chosen instead of others?

The technologies were purposely chosen for mainly two reasons: the first one is that they are open source and community-driven, enabling the participation of anyone willing to contribute to them. The second reason is that they were already used in production at scale, so we didn't reinvent the wheel, we took the best that exists and we made it better.

Q: What is the role they have as a whole and individually in the Science Mesh?

These technologies are the foundation layer of the ScienceMesh federation. This tech stack is bundled into one package we call the IOP (InterOperabilityPlatform) that can be installed by any system administrator running services for their respective communities. The IOP is created in a way that facilitates the deployment in heterogeneous IT infrastructures by abstracting the complexity into a Kubernetes Helm chart, following a similar success that is ScienceBox.

Q: It seems that CS3MESH4EOSC is not just using these open-source technologies in the Science Mesh, but also improving these same technologies (the TRLs) by making them into full commercial applications available to the generic users. What improvements are being implemented and why were they important?

We put a lot of emphasis on ensuring that the work we perform is usable for any user community. For example, the Invitation Workflow we designed and implemented as part of the project can be used by any user community that wishes to share data in a privacy-respecting way.  And in today's world, data privacy is a paramount topic.

Q: How have the owners/developers of these 3 open source technologies embraced the work you are doing with them?

The main developers of these technologies are very much engaged with the goals of the project, and they are actively contributing to it. We have a healthy relationship with them, and we do often meet to steer the projects in the right direction, satisfying all the viewpoints.

Q: By today, what specific aspects are you working on?

We are currently working to finalize the ScienceMesh Nextcloud and ownCloud applications. We plan to include it as part of the respective vendor marketplaces, so other sites can simply deploy the application and become part of the mesh.

Q: What are the next improvements you are planning to do and by when they will be concluded?

We are currently working on simplifying the setup to empower system administrators to deploy the platform. There is a lot of work being done on the security side these days to ensure the communication across sites in the federation is safe. A successful penetration test was performed as well as a security audit. We expect sites joining us at full throttle by Q1 2023.