The project will take advantage of existing key technologies in collaboration between the CS3MESH4EOSC consortium, CS3 community, open-source technology projects/communities, and industry players. Specifically, the Science Mesh will use, contribute to, and help drive the following technology projects:

SCIEBO RDS

SCIEBO RDS

The sciebo RDS (Research Data Services) has set itself the goal of bringing low-threshold services and tools from research data management and scientific analysis to where the scientists are already handling their data – the university cloud “sciebo”.

Sciebo RDS uses an agile approach to software engineering (Scrum with elements from Kanban and XP) which makes it possible to react quickly to new situations, e.g. software problems, and helps particularly with respect to the handling of unforeseen or changing requirements. In order to achieve this ambitious goal, sciebo RDS will pursue the following strategies:

  • The sciebo RDS integrate external research data services, such as for the creation and the administration of data management plans.
  • They offer bridge functionalities by enabling the connection to external services via programming interfaces and connecting these to continuous, partly automatic processes and process chains.
  • They adapt and integrate external expert tools, such as editors for special file formats or data types.
  • They offer basic RDM functionalities, e.g. for capturing metadata or indexing via taxonomies.

Sciebo is a non-commercial cloud storage by universities for universities, where you can securely store your research, study and teaching data.

InvenioRDM

InvenioRDM

InvenioRDM is an open source framework for building large-scale digital repositories. InvenioRDM has a modern web architecture and standards that make it easy to deploy, maintain, and use. InvenioRDM is being developed with a wide range of features to streamline good data practice and boost value throughout the research lifecycle. Read more about the technical features possible in InvenioRDM through the Invenio Framework.

What are the advantages of using InvenioRDM in my environment?

Open source InvenioRDM has a modern web architecture and standards that make it easy to deploy, maintain, and use. InvenioRDM is being developed with a wide range of features to streamline good data practice and boost value throughout the research lifecycle. Read more about the technical features possible in InvenioRDM through the Invenio Framework.

What is the difference between InvenioRDM and Zenodo?

Zenodo is a repository service hosted by CERN. InvenioRDM is a repository application that anyone can use to run a service similar to e.g. Zenodo. Zenodo will run on InvenioRDM by the end of the project period.

What is the difference between Invenio Framework and InvenioRDM?

The Invenio Framework is a toolbox (a code library) which you can use build a repository application like InvenioRDM, an integrated library system application like InvenioILS, or other applications requiring powerful search and document management capabilities.

Is the InvenioRDM project a formal project?

The project is an active open collaboration like other open-source community projects. While there are no formal legal agreements between the project partners, we have taken steps to ensure that collaborators are welcomed to the project and supported with communication and collaboration workflows.

How do you ensure the success of the project?

The project is designed so that it tolerates a loss of partners except for the two core partners CERN and Northwestern University. The project is further purposely kept short in time to quickly deliver the first product release that all partners can run in production.

How did the project come about?

Zenodo is open-source licensed and thus several institutions tried to reuse the Zenodo source code, although it was never meant to be installed elsewhere (it is a service, not an application). Other institutions tried to use the Invenio Framework to build a RDM repository from scratch. Several institutions tried to make the same modifications but had no easy way of sharing their changes. All these institutions came together to create a collaborative open source project and grow a sustainable community.

Can I join the project?

Yes. The InvenioRDM project is an open collaboration where anyone can join. The collaboration is characterized by a common goal and strengthened by open information sharing, public discussions, and robust collaboration. To beome a project partner your instituion must commit a minimum of 1.5 person months per year as contribution to the project. This commitment is made via a letter of support to the project.

How do I join the project?

To join the project, we expect you to commit a minimum of 1.5 person-months of efforts into the project on a yearly basis. You can commit effort in a variety of ways: development, documentation, requirements gathering, community building, testing, and more. If interested, contact the project manager Lars Holm Nielsen.

Can I see InvenioRDM in action?

Yes, for instance:

DESCRIBO (RO-CRATE)

DESCRIBO (RO-CRATE)

Describo is a pair of tools for researchers and support staff to create spec-compliant RO-Crates w/:

  • Linked data for high-precision unambiguous metadata
  • Package level “who, what, where” metadata
  • Individual file-level metadata including provenance

Describo exists as:

  • a desktop tool (electron application) for all major platforms: https://uts-eresearch.github.io/describo/
  • an online tool that integrates with Microsoft OneDrive: https://github.com/UTS-eResearch/describo-online

An RO-Crate is a structured archive of all the items that contributed to the research outcome, including their identifiers, provenance, relations and annotations. It is based on schema.org annotations in JSON-LD, and aims to make best-practice in formal metadata description accessible and practical for use in a wide variety of situations.

OnlyOffice

ONLYOFFICE

Run your own office with the ONLYOFFICE. Our mission is to bring the most innovative web office apps to everyone.

ONLYOFFICE Docs - the future of document processing

Create, edit, and collaborate on documents anywhere, any time. ONLYOFFICE Docs, a powerful online editor for text documents, spreadsheets, and presentations for the platform you use.

ONLYOFFICE Workspace - the ONLY thing you need to make your business grow

ONLYOFFICE offers a complete productivity suite with document management, project management, CRM, calendar, mail, and corporate network. In this way you don't need to switch back and forth between multiple applications to perform different tasks.
Here you obtain a single multi-featured system to organize every step of your work improving your productivity and optimizing efforts for success.
Collabora

Collabora

Collabora is an Open Source that enables them to develop the best solutions, whether writing a line of code or shaping a longer-term strategic software development plan. Whether writing a line of code or shaping a longer-term strategic software development plan, we'll help you navigate the ever-evolving world of Open Source.

Collabora offers a comprehensive range of services to help you in every step of your Open Source projects. From design and implementation, to QA and maintenance, we can support you from start to finish and beyond.

RUCIO

RUCIO

Built on more than a decade of experience, Rucio serves the data needs of modern scientific experiments. Large amounts of data, countless numbers of files, heterogeneous storage systems, globally distributed data centres, monitoring and analytics. All coming together in modular solution to fit your needs. Rucio serves the data needs of modern scientific experiments. Large amounts of data, countless numbers of files, heterogeneous storage systems, globally distributed data centres, monitoring and analytics. All coming together in modular solution to fit your needs.

Extremely scalable

Need to search through billions of files? Need to transfer petabytes of data? Rucio has got you covered. Our largest installation for the ATLAS Experiment is responsible for more than 450 Petabytes of data, stored in a billion files, distributed over 120 data centres globally, and orchestrating an Exabyte of data access and transfer per year.

Policy-driven

Declarative data management allows you to say what you want, and let Rucio figure out the details how to do it. Manage your data with expressive statements. Three copies of my file on different continents, and have one backup on tape? Automatically remove it once its access popularity goes to zero? No problem.

Insights and analytics

Follow your data evolution over time, so you can keep control. From the popularity of your files, to the storage space and tape accounting of your data centres. Fully integrated with Graphite, ElasticSearch, and Hadoop.

FAIR

Rucio supports the FAIR data principles that promote maximal use of research data!

Smart namespace

Organise your files in datasets and containers, create virtual overlaps, distribute them by scope, or attach important metadata.

Storage support

Rucio connects your existing storage, and you can easily add new and different ones. Even tapes, cloud-based storage, or supercomputers. We want you to have choice and not lock you down to a single solution.

Easy integration

Existing applications and workflow systems can be integrated easily through our open libraries and REST servers. Rucio will not disrupt your experiment's workflows.

Authentication and authorisation

The classic username/password, x509 certificates with proxy support, GSS/Kerberos, SSH public keys, OpenID Connect, and SAML are all supported.

Monitoring

Directly integrated with ElasticSearch and Graphite, so you will never lose track of your data. Follow system performance from a single file to the global overview.

Open source powered

Robust code written in the Python language, unit-tested, PEP-certified. Deploy with pip or containers. It's free as in freedom (Apache v2) and open source!

Consistency

Data loss happens every day, and Rucio is prepared. Smart consistency and recovery mechanisms help you not to lose your data!.

Proven track record

Originally built to withstand the requirements of the high-energy physics experiment ATLAS, Rucio is scalable and robust; But also serving smaller communities, Rucio makes a scientist's life easier.

 

File Transfer Service (FTS)

File Transfer Service (FTS)

FTS is an open source software for reliable and large-scale data transfers. It provides easy user interfaces for submitting transfers: Python CLI, Python Client, WebFTS and Web Monitoring. Checksums and retries are provided per transfer and it is a flexible tool due to its multiprotocol support (Webdav/https, GridFTP, xroot, SRM). It also allows parallel transfers optimization to get the most from network without burning the storages.

FTS is used by many research organizations in the High Energy Physics domain and outside

Key components

  • WebFTS - Simplifying power: WebFTS is a web interface that provides a file transfer and management solution in order to allow users to invoke reliable, managed data transfers on distributed infrastructures.
  • FTS-REST - Python API: FTS-REST provides a Python API for easy integration with frameworks and a CLI for copying files from one site to another.
  • Real-Time Monitoring - FTS provides monitoring for several profiles: General monitoring (Grafana) for end users, Discovery Data (Kibana) for researches and Service Specific (ftsmon/Kibana) for service managers.
  • Optimizer - Taking the most from our infrastructure: The optimizer makes it possible to run transfers between any two random endpoints with good reliability and performance with zero configuration by default.
  • GFAL2 - Multiprotocol support: GFAL-2 is a plugin based library for file manipulation supporting multiple protocols (Webdav/https, GridFTP, xroot, SRM).
  • Support: FTS support is excellent thanks to the FTS team at CERN.
Rclone

Rclone

Rclone is a command line program to manage files on cloud storage. It is a feature rich alternative to cloud vendors' web storage interfaces. Over 40 cloud storage products support rclone including S3 object stores, business & consumer file storage services, as well as standard transfer protocols.

Rclone really looks after your data. It preserves timestamps and verifies checksums at all times. Transfers over limited bandwidth; intermittent connections, or subject to quota can be restarted, from the last good file transferred. Where possible, rclone employs server-side transfers to minimise local bandwidth use and transfers from one provider to another without using local disk. Virtual backends wrap local and cloud file systems to apply encryption, compression, chunking, hashing and joining. Rclone mounts any local, cloud or virtual filesystem as a disk on Windows, macOS, linux and FreeBSD, and also serves these over SFTP, HTTP, WebDAV, FTP and DLNA.

Rclone is mature, open-source software originally inspired by rsync and written in Go. The friendly support community is familiar with varied use cases. Official Ubuntu, Debian, Fedora, Brew and Chocolatey repos. include rclone. Rclone is widely used on Linux, Windows and Mac. Third-party developers create innovative backup, restore, GUI and business process solutions using the rclone command line or API.

Rclone does the heavy lifting of communicating with cloud storage.

Rclone helps you:

  • Backup (and encrypt) files to cloud storage
  • Restore (and decrypt) files from cloud storage
  • Mirror cloud data to other cloud services or locally
  • Migrate data to the cloud, or between cloud storage vendors
  • Mount multiple, encrypted, cached or diverse cloud storage as a disk
  • Analyse and account for data held on cloud storage using lsf, ljson, size, ncdu
  • Union file systems together to present multiple local and/or cloud file systems as one

Features

  • Transfers
    • MD5, SHA1 hashes are checked at all times for file integrity
    • Timestamps are preserved on files
    • Operations can be restarted at any time
    • Can be to and from network, e.g. two different cloud providers
    • Can use multi-threaded downloads to local disk
  • Copy new or changed files to cloud storage
  • Sync (one way) to make a directory identical
  • Move files to cloud storage deleting the local after verification
  • Check hashes and for missing/extra files
  • Mount your cloud storage as a network disk
  • Serve local or remote files over HTTP/WebDav/FTP/SFTP/DLNA
  • Experimental Web based GUI
CodiMD

CodiMD

CodiMD is a platform for sharing and writing notes in Markdown. CodiMD is the free software version of HackMD, developed and opened source by the HackMD team with reduced features (without book mode). CodiMD has as features a markdown editor, export as PDF, import from Gist, as well as slides and notes support.

CodiMD is perfect for open communities, while HackMD emphasizes on permission and access controls for commercial use cases. HackMD team is committed to keep CodiMD open source. All contributions are welcome!

You would find all documentation here: CodiMD Documentation

Deployment

If you want to spin up an instance and start using immediately, see Docker deployment. If you want to contribute to the project, start with manual deployment.

Configuration

CodiMD is highly customizable, learn about all configuration options of networking, security, performance, resources, privilege, privacy, image storage, and authentication in CodiMD Configuration.

Upgrading and Migration

Upgrade CodiMD from previous version? See this guide
Migrating from Etherpad? Follow this guide

Developer

Join our contributor community! Start from deploying CodiMD manually, connecting to your own database, learn about the project structure, to build your changes with the help of webpack.

Voilà Quantstack

Voilà Quantstack

Voilà turns Jupyter notebooks into standalone applications without requiring any modification to the content. You want to share your content with non-technical readers? Just call Voilà with the notebook to turn it into a deployable web application and interactive dashboard that allows you to share your work with others. It is secure and customizable, giving you control over what your readers experience. The simplicity of Voilà comes at a cost: the page load time.

Unlike the usual HTML-converted notebooks, each user connecting to the Voilà tornado application gets a dedicated Jupyter kernel which can execute the callbacks to changes in Jupyter interactive widgets.

  • By default, Voilà disallows execute requests from the front-end, preventing execution of arbitrary code.
  • By default, Voilà runs with the strip_source option, which strips out the input cells from the rendered notebook.

QuantStack is one of the main organizations supporting the Jupyter project, an open-source ecosystem of developer tools meant to improve the workflows of scientists and engineers. The QuantStack team is responsible for several major evolutions in the project, such as the JupyterLab visual debugger, and collaborative editing. The team comprises seven core contributors and maintainers the project. We are also behind several popular extensions for data visualization, robotics, and dashboarding.

Voilà has been incorporated as a Jupyter subproject.