RUCIO and File Transfer Service serving the data needs of modern scientific experiments


In today's world, scientific collaborations rely heavily on data management to achieve their research goals.

The volume of data generated by scientific instruments is growing exponentially and is presenting new challenges for scientific collaborations in terms of organising, managing, storing and accessing this data.

These data sets are often distributed across multiple data centres at various locations belonging to different organizations and administrative domains, leading to a complex data management process that can be time-consuming and resource-intensive.

Rucio and File Transfer Service (FTS): Two of the On-demand Data Transfer technologies powering up the Science Mesh

That is where Rucio and the File Transfer Service (FTS), two of the On-demand Data Transfer technologies of CS3MESH4EOSC, come in. Rucio is an open-source software framework that provides scientific collaborations with the functionality to organise, manage and access data at scale. 

The File Transfer Service (FTS) is an open-source software for reliable and large-scale data transfers which is used by Rucio to orchestrate data transfer between sites.

RUCIO: The Open-Source Solution for Managing Data at Scale in Scientific Collaborations

Rucio offers a policy-driven data management concept, which enables users to express their needs without having to worry about how to achieve them, while also optimising the system during runtime based on self-instrumentation. The system will enforce these policies, automatically check for corrupted data and recover them, while providing users with an efficient way to access the data.

This eliminates the need for manual management of data, allowing researchers to focus on analysing the data instead of managing it.

Rucio shields the user from the complexity of operating and accessing a federated data infrastructure while allowing the organisations to express requirements and workflows of their data in policies, which Rucio will automatically enforce.

With demonstrated usability, performance, scalability, and robustness, Rucio enables scientific collaborations to fully use their distributed heterogeneous storage resources, making data-intensive research more manageable and accessible.

File Transfer Service: Simplifying and optimising large-scale file transfers for research organisations

At its core, FTS is a low-level data management service, responsible for scheduling the reliable bulk transfer of files from one site to another while allowing participating sites to control the network resources usage. User interfaces are provided via CLI tools, Python bindings, a REST API and the WebFTS component, which allow end-user to submit transfers to the system in the easiest manner.

It also acts as a queueing system, deciding which transfer to schedule next according to user-set labels and priorities. FTS supports multiple transfer protocols, hiding the complexity involved in transfer technology from the submitting client. Similarly important, the system provides a comprehensive monitoring interface, including publishing transfer data in JSON format.

Data management communities benefit from FTS by delegating the transfer and tape operation, thus being shielded from all the complexity involved in disk-to-disk and tape technology. More so, they get a comprehensive monitoring interface out of the box via the FTS Web Monitoring solution. For more complex monitoring use cases, all FTS transfer data is published as JSON, which can be fed into modern aggregators such as InfluxDB or ElasticSearch. 



In summary, Rucio is an open-source software framework that offers scientific collaborations a solution for managing large volumes of data distributed across multiple data centres, while FTS project provides a transfer solution for distributing a large number of files in an efficient manner, optimizing the network throughput while respecting storage site limits.

Their integration with the CS3MESH4EOSC Science Mesh ensures that the data is efficiently and effectively managed while the users are provided with an efficient way to access the data. 

Discover the other technologies in the Science Mesh