Computing Reviews, the leading online review service for computing literature.

Search

Error recovery in asynchronous systems
Campbell R., Randell B. IEEE Transactions on Software EngineeringSE-12 (9):811-826,1986.Type:Article

Date Reviewed: Jul 1 1987

Techniques for structuring forward and backward error recovery in single process systems are now quite well understood; unfortunately, this is not at all the case for systems made out of concurrent processes. This paper describes a general error recovery scheme for systems of concurrent processes. In particular, it proposes a general exception handling scheme for atomic actions involving concurrent interacting processes. According to this scheme, an exception raised by one of the processes of an atomic action implies a change from normal to abnormal activity for all processes of the action, so fault-tolerance measures involve all processes composing the action. If the abnormal activities succeed in handling the exception, then processes return to normal activity; otherwise, an atomic action failure is signaled. Appropriate rules have been set up that allow us to solve ambiguities in the choice of abnormal activities to handle particular exceptions currently raised by components of an atomic action. Some implementation issues are also discussed. This paper provides an in-depth study of error recovery in single- or multiple-process systems and should be of interest for distributed system designers. However, it is clear that more experience is needed in order to fully evaluate the applicability of the proposed techniques in the structuring of real distributed systems. Complementary studies aimed at precisely characterizing the semantics of the proposed structure and, possibly, at providing some methodological approach to the design of fault-tolerant software using the proposed scheme would be welcome. Of course, such research is out of the scope of this paper, but it is worth some further efforts.

Reviewer: J.P. Banatre	Review #: CR111029

Error Handling And Recovery (D.2.5 ... )

Concurrency (D.4.1 ... )

Fault-Tolerance (D.4.5 ... )

Synchronization (D.4.1 ... )

Would you recommend this review?

yes

Other reviews under "Error Handling And Recovery":	Date

(N,K) concept fault tolerance Krol T. IEEE Transactions on Computers 35(4): 339-350, 1986. Type: Article	Nov 1 1987

Static analysis to support the evolution of exception structure in object-oriented systems Robillard M., Murphy G. ACM Transactions on Software Engineering and Methodology 12(2): 191-221, 2003. Type: Article	Nov 25 2003

A component-based design and compositional verification of a fault-tolerant multimedia communication protocol Hanumantharaya A., Sinha P., Agarwal A. Real-Time Imaging 9(6): 401-422, 2003. Type: Article	Oct 11 2004

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy