Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Error recovery in asynchronous systems
Campbell R., Randell B. IEEE Transactions on Software EngineeringSE-12 (9):811-826,1986.Type:Article
Date Reviewed: Jul 1 1987

Techniques for structuring forward and backward error recovery in single process systems are now quite well understood; unfortunately, this is not at all the case for systems made out of concurrent processes. This paper describes a general error recovery scheme for systems of concurrent processes. In particular, it proposes a general exception handling scheme for atomic actions involving concurrent interacting processes. According to this scheme, an exception raised by one of the processes of an atomic action implies a change from normal to abnormal activity for all processes of the action, so fault-tolerance measures involve all processes composing the action. If the abnormal activities succeed in handling the exception, then processes return to normal activity; otherwise, an atomic action failure is signaled.

Appropriate rules have been set up that allow us to solve ambiguities in the choice of abnormal activities to handle particular exceptions currently raised by components of an atomic action. Some implementation issues are also discussed.

This paper provides an in-depth study of error recovery in single- or multiple-process systems and should be of interest for distributed system designers. However, it is clear that more experience is needed in order to fully evaluate the applicability of the proposed techniques in the structuring of real distributed systems. Complementary studies aimed at precisely characterizing the semantics of the proposed structure and, possibly, at providing some methodological approach to the design of fault-tolerant software using the proposed scheme would be welcome. Of course, such research is out of the scope of this paper, but it is worth some further efforts.

Reviewer:  J.P. Banatre Review #: CR111029
Bookmark and Share
 
Error Handling And Recovery (D.2.5 ... )
 
 
Concurrency (D.4.1 ... )
 
 
Fault-Tolerance (D.4.5 ... )
 
 
Synchronization (D.4.1 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Error Handling And Recovery": Date
(N,K) concept fault tolerance
Krol T. IEEE Transactions on Computers 35(4): 339-350, 1986. Type: Article
Nov 1 1987
Static analysis to support the evolution of exception structure in object-oriented systems
Robillard M., Murphy G. ACM Transactions on Software Engineering and Methodology 12(2): 191-221, 2003. Type: Article
Nov 25 2003
A component-based design and compositional verification of a fault-tolerant multimedia communication protocol
Hanumantharaya A., Sinha P., Agarwal A. Real-Time Imaging 9(6): 401-422, 2003. Type: Article
Oct 11 2004
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy