REG NASA-LLIS-0772--2000 Lessons Learned Fault Protection.pdf

资源描述

1、Best Practices Entry: Best Practice Info:a71 Committee Approval Date: 2000-04-11a71 Center Point of Contact: JPLa71 Submitted by: Wilson HarkinsSubject: Fault Protection Practice: Fault protection is the use of cooperative design of flight and ground elements (including hardware, software, procedure

2、s, etc.) to detect and respond to perceived spacecraft faults. Its purpose is to eliminate single point failures or their effects and to ensure spacecraft system integrity under anomalous conditions.Abstract: Preferred Practice for Design from NASA Technical Memorandum 4322A, NASA Reliability Prefer

3、red Practices for Design and Test.Provided by IHSNot for ResaleNo reproduction or networking permitted without license from IHS-,-,-Fault protection design maximizes the probability of spacecraft mission success by avoiding possible single failure points through the use of autonomous, short-term com

4、pensation for failed hardware.Implementation Method:Except during critical event periods, the primary purpose of an autonomous fault protection system is to place the spacecraft in a safe, commandable state which can be maintained for a reasonable period (typically two weeks) following a fault. Duri

5、ng critical periods, the primary purpose of the fault protection system is to ensure the completion of the critical event. A simplified block diagram representing the following three general types of fault protection is illustrated in Figure 1:a71 Subsystems alone.a71 Subsystem to system, anda71 Sys

6、tem to ground control.refer to D descriptionD Fault Protection Allocations. All on-board, post lift-off, autonomous fault protection is designated as either “subsystem internal“ or “system“ fault protection. Fault protection engineering elements which have been allocated fault protection responsibil

7、ity must provide the requirements and design Provided by IHSNot for ResaleNo reproduction or networking permitted without license from IHS-,-,-for the associated detections, monitors, responses, and diagnostic data in compliance with project functional requirements. Where science instruments include

8、 fault protection in their design, designers must still ensure compliance with spacecraft project fault protection requirements if one of the following conditions apply:a71 The fault protection internal to the instrument is dependent on non-standard services from another subsystem (or another instru

9、ment), ora71 Internal failures have an impact external to the instrument (viz, a change in power state, momentum, support to other instruments).Spacecraft Safing. Spacecraft “safing“ is a general purpose safe-state response which is initiated by both system and subsystem internal fault protection. T

10、he purpose of this response is to provide the following:a71 A safe state for the hardwarea71 An uplink, anda71 A downlink (with some exceptions for specific failure conditions).To achieve these goals, the normal stored sequence is terminated and non-essential spacecraft loads are powered off.Undervo

11、ltage Response. Most spacecraft designs include an undervoltage response, which is designed to protect the spacecraft in the event of a short or a bus overload. The hardware senses when the power drops below an established value for a specified time. If the criteria are met, the power system sheds a

12、ll non-essential loads from the bus and indicates the undervoltage condition to the Command Subsystem, which will initiate the undervoltage recovery response. Critical spacecraft memories are maintained throughout the undervoltage.Functional Implementation Requirements. Fault protection is typically

13、 allocated to the on-board elements of the system in accordance with the following principles:a71 Spacecraft versus ground control. Autonomous fault protection is included on board the spacecraft only if a response by Mission Operations is not feasible nor practical, or if action is required within

14、two weeks of detecting the failure. Otherwise, ground control is responsible for fault recovery. In both cases, ground control is responsible for failure diagnosis and, if necessary, the configuration of the spacecraft to nominal operations after the fault.a71 Protection against sabotage and operato

15、r errors. To simplify the development of fault protection, autonomous fault protection is not required to protect against sabotage or operator errors, although such protection is not prohibited. There is limited spacecraft protection against these failures (viz, information system data integrity che

16、cks and some software checks).a71 Protection against spacecraft hardware and software design errors. To simplify the Provided by IHSNot for ResaleNo reproduction or networking permitted without license from IHS-,-,-development of fault protection, autonomous fault protection is usually not required

17、to protect against spacecraft design errors, although it is not prohibited if practical. The practice of fault protection typically provides some limited spacecraft protection against design errors (e.g., thermal fault responses).The autonomous fault protection function is responsible for all on-boa

18、rd fault detections and corrections except those routinely required to ensure spacecraft data integrity (viz, EDAC, Reed Solomon encoder, checksums, etc.). Data error detections and corrections may be used, however, for fault protection purposes. The spacecraft information system typically has the p

19、rimary responsibility for ensuring data integrity.Fault Protection Design Requirements. Management and coordination of fault detection, monitoring, and response, for both system and subsystem internal fault protection, is performed in accordance with the following general rules:a71 Enables/disables

20、for responses. Where applicable, fault responses should have two enable/disable mechanisms (or the functional equivalent): 1. an enable/disable by stored sequence, and2. an enable/disable by ground control or by fault protection algorithms.a71 Enables/disables for monitor activation of any response.

21、 If a response can be initiated by more than one monitor, those monitors should include an enable/disable mechanism or the functional equivalent.a71 Enable/disable state specification. Each enable/disable is specified by a single parameter unique to each fault protection algorithm.a71 Enable/disable

22、 strategy - general. As a goal, fault protection monitors and responses should be designed to be enabled for the entire mission. This reduces the risk of incorrect fault protection states.a71 Enable/disable strategy - critical events. For critical events, enable/disable strategies may be used to min

23、imize or prevent the effects of an erroneous fault indication.a71 Response initiation. Fault responses are initiated if and only if spacecraft performance is unacceptable, or there is a significant risk to the mission or to subsystem safety.a71 Parameter modifications. All fault protection parameter

24、s which may reasonably be expected to change as a function of mission mode, type of activity, fault history, or operational experience should be alterable by ground control without requiring flight software modification.a71 Software modifications. To the extent possible, monitor and response algorit

25、hms should be stored in programmable RAM.a71 Configuration compatibility. On-board fault protection should be designed to respond to a fault while in any possible spacecraft configuration (e.g., fault protection should be able to accommodate all possible combinations).a71 Independence from instrumen

26、ts. Engineering fault protection should not depend on science instruments or their data.a71 Multiple faults. At a minimum, fault protection should be designed with the assumption that Provided by IHSNot for ResaleNo reproduction or networking permitted without license from IHS-,-,-only one fault occ

27、urs at a time, and that a subsequent fault will occur no earlier than the response completion time for the first fault. As a goal, fault protection should be capable of recovering from multiple successive or coincident faults provided that the faults and associated fault algorithms are independent.a

28、71 Propagation of failures. Autonomous fault protection assumes that spacecraft hardware design ensures that a single failure in a subsystem (including instruments) cannot propagate to its redundant unit or to another subsystem, or prevent switching to its redundant unit. This can be verified by per

29、forming a failure modes, effects, and criticality analysis (FMECA) or fault tree analysis (FTA).Typical Fault Detection Design Requirements. Hardware and software detection sources have two criteria:a71 Direct detection. Detection mechanisms should be as direct as possible (i.e., a direct measuremen

30、t is preferred over a calculated or derived measurement).a71 Detection coverage. Detection mechanisms should only be required to detect a failure to the level at which that failure can be isolated or corrected.Design Requirements for Fault Monitors. Software monitors used by system and subsystem int

31、ernal fault protection have the following features:a71 Monitor thresholds. Where possible, thresholds should use reasonableness checks, detection filtering (to exclude certain faults from a previously established fault database), or redundant detections.a71 Threshold modifications. Monitor threshold

32、 values should be alterable by ground or sequence command, or by fault protection responses as appropriate. As a goal, monitors are best designed to detect and disregard failed sensors.a71 Redundant detection. For detections where an inadvertent trip would result in a severe response (viz, downlink

33、loss, irreversible hardware swaps, large use of expendables, critical sequence cancellation), and where a sensor anomaly could cause an inadvertent trip, independent physically or functionally redundant detections are employed such that simultaneous detections are necessary for response initiation.a

34、71 Fault response tolerance. Monitors are designed to be tolerant of off-nominal conditions following a reconfiguration resulting from a fault protection response (e.g., thresholds might be relaxed as part of a response).Design Requirements for Fault Responses. System and subsystem internal fault re

35、sponse concerns include:a71 Fault response primary responsibility. Following an anomaly, fault responses should ensure spacecraft commandability and the maintenance of a safe state for at least two weeks. This requirement is superseded only by a requirement to complete a critical event.Provided by I

36、HSNot for ResaleNo reproduction or networking permitted without license from IHS-,-,-a71 Fault protection priorities. Fault responses are designed with the following priorities: 1. Protect critical spacecraft functionality,2. Protect spacecraft performance and consumables,3. Minimize disruptions to

37、normal sequence operations, and4. Simplify ground recovery response, including providing for downlink telemetry.a71 Multiple levels of response. Where possible, response design includes multiple levels of response, with the response actions executed in order of increasing severity.a71 Real-time grou

38、nd responses. Autonomous fault protection is designed so as to not require real-time ground responses for recovery from known faults.a71 False alarm tolerance. Unintended entry into a fault protection response in the absence of a fault must not present a hazard to the spacecraft or mission. For crit

39、ical event periods, however, this requirement is relaxed and is considered a goal.a71 Use of redundant (and spare) units. Redundant or spare units may be used by autonomous fault protection responses if a satisfactory alternative design is not available.a71 Unpowered redundant units. The transition

40、of a unit from “off“ to “prime“ must not require ground commands in order to support spacecraft fault protection and mission critical functions.a71 Component warm-up times. Fault responses should take into account component warm-up times and similar delay requirements.Data Handling Requirements. The

41、 following data handling tasks are performed:a71 Recording engineering data. The combination of sequences and fault responses should ensure the recording of engineering data prior to, during, and after the execution of any fault response. Some exceptions are made for recorder and command subsystems

42、failures.a71 Storage and preservation of diagnostic data. Fault protection is designed to include the storage of diagnostic data (see Telemetry and Diagnostic Data on page 7), and ensure that data are not overwritten as the result of a response action. This requirement only applies if the writing of

43、 diagnostic data is not affected by the original fault.a71 Protection of critical science and engineering data. Fault responses must not destroy “critical data“ stored on-board the spacecraft. “Critical science and engineering data“ must be defined by project policy.Requirements Interactions with St

44、ored Sequences. The following interactions with on-board sequences may be necessary:a71 Response design for critical events. Fault response is designed to ensure the completion of the critical event as and when required, with spacecraft safety having lower priority, until the critical events are com

45、pleted. Orbit insertion is an example of a critical event.a71 Response design for non-critical events. Unless required to execute a critical event, fault responses should stop any on-board sequence(s) only if the sequence(s) compromises the integrity of the fault response, or if the fault response c

46、ompromises the integrity of the sequence.Provided by IHSNot for ResaleNo reproduction or networking permitted without license from IHS-,-,-a71 Reactivation of stored sequences. After completion of a response which terminates a non-critical stored sequence, fault responses should not autonomously res

47、ume the terminated sequence.Safing Requirements. “Safing“ is defined as a general purpose fault response which results in the cancellation of non-critical sequences, the possible suspension of critical sequences, and a general reconfiguration of spacecraft components. Safing responses typically incl

48、ude the following general features:a71 Uplink communications. The safing response provides for a spacecraft state and attitude that ensures uplink commandability in the long term.a71 Downlink communications. The safing response provides for a spacecraft state and attitude that ensures continuous eng

49、ineering telemetry with positive link margin.a71 Environmental constraints. The safing response meets boresight and radiator environmental constraints.a71 Safing priorities. When uplink, downlink, and hardware safing requirements are in conflict, the following priorities apply: 1. Provide a safe state for the hardware,2.

展开阅读全文