Tuesday, February 12, 2008

Failover Testing

-->Failover is a backup operation that automatically switches to a standby database, server or network if the primary system fails or is temporarily shut down for servicing.

--> It is an important fault tolerance function of mission-critical systems that rely on constant accessibility. Failover automatically and transparently to the user redirects requests from the failed or down system to the backup system that mimics the operations of the primary system.

Typically used interchangeably with 'recovery testing',it is testing how well a system recovers from crashes, hardware failures, or other catastrophic problems.

Following is a brief description which I had collected from online docs available.Hope this will give an insight on the topic.

Objectives of Failover Testing.

The idea behind a failover system is that if the primary component fails, the secondary system should automatically engage. Theoretically, this means you should be able to just pull the plug on the primary system and watch the secondary system instantly take over (Don't Do That!).

-->The primary objective of Failover testing is to test system recovery measures in place for a production architecture.

-->Failover testing provides technology specialists a realistic benchmark of how a mission critical component will respond when failure occurs on the network.

When to do Failover Testing.

-->Failover testing is done prior to any performance specific testing on an environment that mimics production.

-->Failover testing will ensure that the hardware and software components will react accordingly when a failure occurs.


[Fail-Back Testing - Fail-back testing should also be employed to verify that when a component is back functioning correctly, that it is available to take load again and sustain the influx of activity when it joins the infrastructure again.]


Example:


In a web environment, failover testing determines what will happen if multiple web servers are being used under peak anticipated load, and one of them dies.






This is just one of many failover configurations. Some failover configurations can be quite complex, especially when there are redundant sites as well as redundant equipment and communications lines. In this type of configuration, when one of the application servers goes down, then the two web servers that were configured to communicate with the failed application server can not take load from the load balancer, and all of the load must be passed to the remaining two web servers.





When such a failover event occurs, the web servers are under substantial stress, as they need to quickly accommodate the failed over load, which probably will result in doubling the number of HTTP connections as well as application server connections in a very short amount of time. The remaining application server will also be subjected to severe increase in load and the overheads associated with catering for the increased load.It is crucial to the design of any meaningful failover testing that the failover design is understood, so that the implications of a failover event, while under load can, be scrutinized.
Hope this is helpful!!!

1 comment:

qa.aashish said...

the same happened when i performed the load testing on EP2 project.
the number of HTTP connections were increased that time and technical person had wrongly connected the DB( somewhat she made it incorrect I do not remember the exact)...so all in went to less number of available HTTP connections and load testing which I had been doing resulted in increased number of HTTP connections......that time all were really frightened about the performance of web app…It’s a really good and systematic article. You have put all the things systematically. That’s Good… thanks a lot Gargi.