vSphere Fault Tolerance

vSphere Fault Tolerance
Name	vSphere Fault Tolerance
Developer	VMware
Released	2010
Latest	vSphere 8.x
Platform	x86-64
License	Proprietary

Contents

Overview
Architecture and Components
Configuration and Requirements
Operation and Failover Behavior
Performance and Limitations
Best Practices and Management
Security and Compatibility

vSphere Fault Tolerance

vSphere Fault Tolerance provides continuous availability for virtual machines using redundant execution across hosts. It creates a secondary copy of a running VMware ESXi guest on a different host to protect workloads from host failures, integrating with vSphere High Availability, vCenter Server, and enterprise features used by organizations such as Bank of America, Google, Microsoft, Amazon, and NASA. The feature evolved alongside major virtualization and cloud milestones including developments at VMware, Inc., the rise of Amazon Web Services, the expansion of Microsoft Azure, and standards shaped by bodies like IEEE and IETF.

Overview

Fault Tolerance offers lockstep or near-lockstep redundancy by running a primary and an online secondary virtual machine concurrently on separate ESXi hosts. The mechanism supports critical workloads similar to those protected by legacy solutions used at institutions such as JPMorgan Chase, Citigroup, Goldman Sachs, Deutsche Bank, and HSBC, and complements cluster-level features found in products by Red Hat, Canonical, and SUSE. Concepts relevant to its adoption include enterprise availability strategies employed by General Electric, Siemens, Boeing, Lockheed Martin, and Samsung and reflect priorities highlighted in major incidents involving Equifax, Target Corporation, Sony Pictures Entertainment, and British Airways.

Architecture and Components

The architecture centers on a primary VM and a secondary VM that maintain deterministic execution through a vLockstep engine, managed by vCenter Server and scheduled by the vSphere Distributed Resource Scheduler. Components include the vLockstep module, network virtualization via VMware NSX-T Data Center, storage integration with VMware vSAN, and support for virtual hardware interfaces defined in collaboration with ecosystem partners such as Intel, AMD, NVIDIA, and Broadcom. The feature interacts with storage arrays from Dell EMC, NetApp, Hitachi Vantara, and Hewlett Packard Enterprise and with backup and replication solutions by Veeam, Commvault, and Veritas Technologies.

Configuration and Requirements

Deployment requires compatible versions of ESXi, vCenter Server, and hardware platforms certified by vendors including Dell Technologies, HPE, Lenovo, and Cisco Systems. Administrators follow guidance similar to standards published by organizations such as ISO and NIST when configuring network redundancy, CPU compatibility lists informed by Intel and AMD, and storage paths recognized by SNIA. Interoperability is verified against ecosystem solutions from VMware Cloud Foundation, containerization services like Kubernetes, and hybrid-cloud products used by IBM and Oracle Corporation.

Operation and Failover Behavior

During normal operation the primary VM executes workload threads while the secondary mirrors execution nondestructively; inputs and I/O are forwarded to maintain equivalence. On host failure the secondary takes over instantly with no boot or reboot required, a behavior expected in mission-critical contexts similar to those at Federal Reserve System, European Central Bank, World Bank, and International Monetary Fund. Integration with monitoring and alerting platforms from Splunk, Datadog, PagerDuty, and New Relic supports operational visibility and incident response aligned with procedures used by FEMA, Department of Homeland Security, CDC, and WHO.

Performance and Limitations

FT imposes CPU, memory, network, and storage overhead to maintain deterministic pairing, which affects scale compared with clustering solutions used by SAP SE, Oracle Corporation, and Salesforce. Limitations include supported virtual hardware versions, guest OS compatibility certified with partners such as Microsoft Windows, Red Hat Enterprise Linux, and SUSE Linux Enterprise Server, and constraints on latency and bandwidth often discussed in technical evaluations alongside research from MIT, Stanford University, Carnegie Mellon University, and University of California, Berkeley. Architectural trade-offs echo debates seen in designs by Amazon, Google, Facebook, and Twitter around consistency and availability.

Best Practices and Management

Recommended practices include isolating primary and secondary VMs on separate physical fault domains, using dedicated networks similar to designs by AT&T, Verizon, Comcast, and Telefonica, and maintaining resource pools configured with vSphere DRS and Affinity Rules. Administrators often combine FT with backup strategies from Veeam or Rubrik and change management frameworks like ITIL and COBIT, and follow audit regimes used by PwC, Deloitte, KPMG, and Ernst & Young for compliance in regulated sectors such as HIPAA and PCI DSS environments.

Security and Compatibility

Security considerations include protecting the synchronization channel with encryption and network isolation strategies advocated by NSA, GCHQ, and ENISA, and ensuring compatibility with SELinux policies and AppArmor profiles where applicable. Interoperability matrices reference guest OS and driver support from Microsoft, Red Hat, SUSE, and hardware vendors including Intel and AMD. Updates and patches are coordinated through vendor lifecycle programs employed by VMware, Inc., Microsoft, and Red Hat to mitigate vulnerabilities tracked by databases like CVE and standards maintained by MITRE.

Category:VMware