What Is Fault Tolerance and Why It Differs From High Availability

Posted on March 19, 2020 by Jake Fellows

Home > Blog > Enterprise Hosting > What Is Fault Tolerance and Why It Differs From High Availability

Which is faster, the speed of light or the speed of judgement?

During a user’s discovery period, if they click on a search result and do not get a result within seconds, they not only navigate away, their trust in the URL erodes.

The user might give the site one more chance, but a failure notice could give them the impression the site (and business) no longer exists.

Without fault tolerance, those site visits are gone. Without high availability, those customers are potentially gone, leading to lost revenue and degraded brand reputation.”

Customers carve out specific times to search and make an inquiry or purchase decision. Missing that opportunity can affect the bottom line.

This flows into reputation, where consumer trust often starts at the website, but when a potential visitor can’t even access it, they generally avoid the URL due to low trust.

Subscribe to the Liquid Web weekly Newsletter for more tips on keeping your website online and running seamlessly.

Mitigating Low Availability

Hardware failure is the most common contributor to low availability. With a few protections placed in the cloud server infrastructure, the consequences can be mitigated. These include:

Redundancy and Reliability
High Availability (HA) Hardware Environment
Fault Tolerance

Although easily confused, “fault tolerance” and “high availability” are not the same thing. Without the high availability of a server connection, fault tolerance is a moot function.

Let’s look at the differences.

A High Availability Hardware Environment

In 2019, internet world stats estimated there were more than 4.4 billion internet users, an 83% increase since 2014.

All of this traffic passes through servers housed somewhere and maintained by someone.

With this in mind, three things are universal:

Not all servers are created equal
All servers have a finite lifespan
Sometimes servers just break

The soul-crushing inevitability of these universal truths is happily alleviated through redundancy.

Redundancy is key to server uptime by establishing a High Availability Hardware Environment. This is accomplished by assigning more than one server and a floating IP address with data replication to keep them in sync as a failover mechanism.”

Another helpful function supporting redundancy and maintaining HA is a Distributed Replicated Block Device (DRBD®), which allows data to be mirrored between the servers and maintain synchronicity.

In addition, a clustered infrastructure of services called “Heartbeat” provides resource monitoring and messaging to facilitate failover and maintain high availability.

failover to maintain high availability

When a server fails, it triggers the second server to take over, ensuring traffic continues to flow with minimal interruption. Depending on the nature and cause of the failure, the visitor can be directed to the fault tolerant failsafe.

What is Fault Tolerance?

So, some of the server pathways have been blocked or corrupted and little to none of the data assets aren’t being delivered to the user. Bringing the site back to full functionality could take time.

This is where fault tolerance comes into play.

Fault tolerance is another form of redundancy, enabling visitors to access the system in the event of the failure of one or more components. This is achieved through a Storage Area Network (SAN).”

Using extremely fast, low latency gigabit ethernet connected directly to the servers, a SAN is an extremely scalable and fault-tolerant central network storage cluster for critical data. Users transfer data sequentially, or parallel, without affecting the performance of the host server.

fault tolerance using san storage

Fault Tolerance Through Multiple Server Access

Think of computer storage as an elaborate transit system and data as tiny people trying to get somewhere efficiently without spilling their coffee.

This is done with Multipath I/O (MPIO), a fault-tolerance and performance-enhancement technique.

Connected through buses, controllers, switches, and bridge devices, more than one physical path between the central processing unit (CPU) in a computer system and its mass-storage devices is defined.

Like commuting during a cold winter, it’s all about layering.

MPIO layers can leverage the redundant paths to provide performance-enhancing “dynamic load balancing.” Sites have hundreds (sometimes millions) of visitors, each making requests for text, images, videos, etc. as they navigate through the site.”

Dynamic load balancing ensures quick access.

Staying with the transit analogy, load balancing is the person in the reflective vest standing in the intersection of the servers directing traffic.

If one server houses an ultra-popular data asset, it will get overworked and degrade disproportionately, so the data and requests need to be distributed across the web cluster.

The movement of data assets can be ultra-fast and confusing, creating complicated pathways. The MPIO algorithmic load balancing technique maximizes speed and capacity utilization.

If a server falters, the load balancer redirects traffic to the remaining online servers. If a new server is added to the cluster, the load balancer automatically directs requests to it.

The complicated redundancy and failsafe protection supports fault tolerance.

In simple terms, the redundancy through fault tolerance ensures that visitors at least get a portion of the web experience and enough information to make a fair judgement, and retain a positive brand perception.

Web Hosting Reliability

Without a high availability environment, websites just don’t load and fault tolerance won’t save it. Sometimes things break down completely and a 5xx error message begging the visitor to return later is the only answer.

However, non-catastrophic malware attacks, data corruption, or single server breakdowns are mitigated by Liquid Web’s infrastructure, and these protections can be accessed by choosing the correct web hosting configuration.

To further enhance fault tolerance and reliability, the environmental processing systems include Liebert Precision 22-ton up flow air conditioning units that contain independent compressors and cooling loops. Speaking with a technical professional to work through solutions after a server failure can be costly but the proper server protection package can reduce hacking-related support interactions by over 200%.

Reliability in cloud services maintains revenue and reputation.”

Professionals who understand how people discover products, services, and information, and how their experience in the discovery process affects brand trust and decision-making, also understand that fault tolerance and high availability are crucial to that perception and decision-making process.

Companies that spend valuable time and money to create the face of their business should have all of the mechanisms at hand to ensure a consistent and available customer experience.

Get the Ultimate High Availability Checklist For Your Website

Tagged with: Custom Solutions, Uptime, Uptime & Performance

About the Author

Jake Fellows

Jake Fellows is an Associate Product Manager for Liquid Web's Managed Hosting products and services. He has over 10 years experience involving several fields of the technology industry including hosting, healthcare, and IT system architecture. On his time off, he can be found in front of some form of screen enjoying movies, video games and researching into one of his many technical side projects.

View All Posts By Jake Fellows

Want more news and updates like this straight to your inbox?

Keep up to date with the latest Hosting news.