FDT (Fast Data Transfer)

FDT (Fast Data Transfer)
Name	FDT (Fast Data Transfer)

Contents

Introduction
Architecture and Design
Performance and Features
Use Cases and Applications
Implementation and Tools
Security and Reliability
History and Development

FDT (Fast Data Transfer) is a high-performance data transfer application designed to move large datasets over wide area networks with optimized throughput. It integrates techniques from parallel I/O, TCP/IP tuning, and protocol multiplexing to accelerate transfers between endpoints used by research institutions, enterprises, and cloud providers. The tool is commonly employed in contexts requiring bulk movement of scientific, media, or archival data across geographically dispersed sites.

Introduction

FDT (Fast Data Transfer) bridges high-performance networking initiatives pioneered by Lawrence Berkeley National Laboratory, CERN, National Center for Supercomputing Applications, Argonne National Laboratory, and National Energy Research Scientific Computing Center with production systems operated by Amazon Web Services, Google Cloud Platform, Microsoft Azure, IBM Cloud, and Oracle Corporation. It targets environments populated by systems from Cray Inc., Hewlett-Packard Enterprise, Dell Technologies, Lenovo, and Supermicro, and interoperates with storage technologies such as Lustre (file system), GPFS, Ceph, GlusterFS, and ZFS. FDT’s deployment scenarios often involve research collaborations associated with CERN, Large Hadron Collider, Square Kilometre Array, Human Genome Project, and LIGO Scientific Collaboration.

Architecture and Design

The architecture leverages parallel streaming, multithreading, and direct I/O to maximize utilization of links provided by carriers like AT&T, Verizon Communications, NTT Communications, Level 3 Communications, and CenturyLink. Design draws on transport-layer improvements exemplified by projects at Internet2, GEANT, ESnet, GLIF, and Open Exchange Point deployments. Components integrate with orchestration frameworks such as Kubernetes, Apache Mesos, and OpenShift, and with monitoring provided by Prometheus (software), Grafana, Nagios, and Zabbix. The system interfaces with identity and access systems including Kerberos, LDAP, OAuth 2.0, and SAML for credentialed transfers in infrastructures like XSEDE, PRACE, and NERSC.

Performance and Features

FDT supports parallel streams, buffer tuning, and zero-copy I/O patterns inspired by optimizations in RDMA initiatives and hardware from Mellanox Technologies, Intel Corporation, Broadcom, and NVIDIA. It can saturate 10 Gbit/s, 40 Gbit/s, 100 Gbit/s, and emerging 400 Gbit/s links used by networks such as ESnet, Internet2, GEANT, and JANET (UK); comparable deployments exist at Fermilab, SLAC National Accelerator Laboratory, Brookhaven National Laboratory, and Oak Ridge National Laboratory. Features include checkpoint/restart, pipelining, and integrity verification analogous to strategies in rsync, GridFTP, bbcp, and Aspera (company). FDT’s performance characteristics have been compared in evaluations conducted by NetFPGA, Iperf, iperf3, PerfSONAR, and benchmarking suites from SPEC.

Use Cases and Applications

Typical applications include bulk replication for projects like Human Connectome Project, ENCODE Project, EarthScope, NOAA, and NASA missions, archival transfer between data centers operated by Internet Archive, The British Library, Bibliothèque nationale de France, and Library of Congress, and media distribution used by BBC, Discovery Communications, Netflix, and Walt Disney Company. Scientific workflows include integration with pipelines at European Space Agency, National Aeronautics and Space Administration, Max Planck Society, Lawrence Livermore National Laboratory, and Los Alamos National Laboratory. Enterprises adopt it for disaster recovery, data seeding, and large-scale backups across providers such as VMware, NetApp, Pure Storage, and Commvault.

Implementation and Tools

Implementations are available as Java-based servers/clients and as integrations with workflow managers like Apache Airflow, Nextflow, Snakemake, and Pegasus (workflow management). Tooling for automation integrates with Ansible, Terraform, Puppet, and Chef. Interoperability extends to transfer managers and gateways like Globus, S3 (Amazon Simple Storage Service), Swift (OpenStack), Ceph Object Gateway, and MinIO. Diagnostic tooling leverages tcpdump, Wireshark, dstat, iostat, and sar for I/O and network troubleshooting on nodes from vendors Cisco Systems, Juniper Networks, Arista Networks, and HPE Aruba.

Security and Reliability

Security and reliability features align with practices used by National Institute of Standards and Technology, ISO/IEC, CIS (Center for Internet Security), and certifications observed by FedRAMP and SOC 2. Transport encryption often interoperates with OpenSSL, GnuTLS, and hardware TLS offload in platforms from Intel, Broadcom, and Cavium. Authentication relies upon federated systems used by InCommon, eduGAIN, and Internet2, while integrity mechanisms mirror checksums and signature schemes evaluated by SHA-2, SHA-3, and MD5 (deprecated). Reliability mechanisms include retry, checkpointing, and integration with storage replication technologies like DRBD, ZFS replication, and enterprise replication solutions from EMC Corporation and Hitachi Data Systems.

History and Development

Development traces through collaborations among research laboratories and networking consortia including Lawrence Berkeley National Laboratory, CERN, National Center for Supercomputing Applications, ESnet, and Internet2. Its evolution parallels advances in high-speed networking documented in work by Van Jacobson, Sally Floyd, David D. Clark, Paul Baran, and initiatives like DARPA networking research. Adoption expanded as scientific instruments such as Large Hadron Collider, James Webb Space Telescope, ALMA (Atacama Large Millimeter Array), and facilities like Oak Ridge Leadership Computing Facility drove needs for deterministic bulk transfer. The ecosystem continues to evolve alongside projects at Open Science Grid, Globus Alliance, and cloud-scale data movement efforts by Amazon Science, Google Research, and Microsoft Research.

Category:Data transfer software