1、Lessons Learned Entry: 1805Lesson Info:a71 Lesson Number: 1805a71 Lesson Date: 2007-9-4a71 Submitting Organization: JPLa71 Submitted by: David Oberhettingera71 POC Name: Dorothy C. Perkinsa71 POC Email: Dorothy.C.Perkinsnasa.gova71 POC Phone: 301-286-8936Subject: Mars Global Surveyor (MGS) Spacecraf
2、t Loss of Contact Abstract: Contact was lost with the Mars Global Surveyor (MGS) spacecraft in November 2006 during its 4th extended mission. A routine memory load command sent to an incorrect address 5 months earlier corrupted positioning parameters, and their subsequent activation placed MGS in an
3、 attitude that fatally overheated a battery and depleted spacecraft power. The report by the independent MGS Operations Review Board listed 10 key recommendations to strengthen operational procedures and processes, correct spacecraft design weaknesses, and assure that economies implemented late in t
4、he course of long-lived missions do not impose excessive risks.Description of Driving Event: Contact was lost with the Mars Global Surveyor (MGS) spacecraft in November 2006- ten years into its mission to map the surface of Mars and study the atmosphere and interior of the planet. At the beginning o
5、f a prescheduled, routine contact, the spacecraft reported alarms indicating that one solar array drive had temporarily been stuck and that the spacecraft had automatically switched to the redundant drive controller (Reference (1). At the next scheduled contact 2 hours later, the normal spacecraft s
6、ignal was not detected by the Deep Space Network (DSN), and all subsequent attempts to command the spacecraft and reestablish communication were unsuccessful. The mission loss was attributed to a High Gain Antenna (HGA) positioning command sent by the spacecraft operations team five months earlier t
7、hat, in the process of updating several parameters, created a bad memory load (Reference (2). The command was mistakenly written to the wrong Provided by IHSNot for ResaleNo reproduction or networking permitted without license from IHS-,-,-memory address in the spacecrafts onboard computers, corrupt
8、ing two independent parameters and introducing two separate faults. The first parameter error caused a fault in which one solar array was driven against its hard stop; system fault protection incorrectly interpreted the indication as a stuck solar array gimbal and placed MGS into Contingency Mode. I
9、n Contingency Mode, the spacecraft alternates between the Sun-Comm-Power Mode (SCPM) control state (commanding an attitude consistent with thermal control of the spacecraft bus and communications with Earth), and the less conventional Sun-Stuck-Gimbal (SSG) control state (favoring a spacecraft orien
10、tation to the sun optimized for battery charging, even at the risk of violating thermal limits). Because the spacecraft attitude directly exposed one of the spacecraft batteries to the sun during each SSG period (Figure 1), the temperature of the exposed battery continued to rise. The onboard power
11、management software interpreted the battery overheating as an overcharge condition and kept reducing the charge rate. Since the remaining battery could not support the full electrical load, and the attitude during each SCPM period eclipsed one solar panel, both batteries became critically depleted.
12、The second parameter error caused a fault that induced the HGA to point away from the Earth, disrupting downlink communications such that ground controllers remained unaware of the need to correct the mission-critical thermal and power situation. After 5 to 6 Mars orbits in Contingency Mode, the spa
13、cecraft batteries became completely discharged, disabling attitude control and the spacecraft subsystems. Figure 1 is a color diagram of MGS that depicts the orientation of the spacecraft relative to the sun when in the SSG control state. The diagram shows that both the solar panels and the spacecra
14、ft battery are exposed to solar radiation when in this attitude.Figure 1. MGS in Sun-Stuck-Gimbal (SSG) control stateMGS was an overall mission success, having completed its primary mission in January 2001. MGS had a solid record of accomplishments (e.g., use of aerobraking, global mapping, science
15、results, Provided by IHSNot for ResaleNo reproduction or networking permitted without license from IHS-,-,-lander mission support), and operated longer than any other spacecraft sent to Mars. References: (1) “MGS SAM Gimbal and BUS Swap,“ Incident Surprise Anomaly (ISA) No. Z89435, NASA/Caltech Jet
16、Propulsion Laboratory, November 7, 2006. (2) “Report on the Loss of the Mars Global Surveyor,“ Mars Global Surveyor Operations Review Board, NASA Goddard Space Flight Center, July 2, 2007.Lesson(s) Learned: 1. Operational Procedures and Processes. An independent review board (Reference (2) determine
17、d that the MGS mission team followed existing operating procedures and processes, but the rules were inadequate to detect the errors. More thorough operating procedures and processes would have avoided many of the factors contributing to this anomaly. 2. Spacecraft Design Weaknesses. The onboard fau
18、lt protection was inadequate to diagnose and correct the faults that were the likely cause of the mission loss. The spacecraft mistakenly determined that a solar array was stuck and, based on this information, initiated an attitude that was thermally unsafe for an essential battery. In addition, tel
19、emetry did not provide the ground with sufficient data on the cause of the initial fault. 3. Lifetime Management Considerations. MGS had entered its fourth extended mission phase just prior to the anomaly. As is common during extensions to already long-lived missions, the MGS budget and staff had be
20、en reduced to economize on mission ops. While no direct evidence attributes the anomaly to these reductions, the review board judged that such reductions can inherently increase risk. Periodic reviews should have been performed to assure that spacecraft control parameters were appropriate to the cur
21、rent state of the spacecraft, and the risks associated with normal personnel turnover over time should also have been assessed. While the training methodology for some operations positions was excellent, the board noted that it was not uniformly applied.Recommendation(s): The key recommendations in
22、Reference (2) state: Operational Procedures and Processes. 1. Ground alarm limits should be set equal to or within expected flight software and hardware performance limits. 2. Projects should be required to conduct, prior to upload, a thorough review of all proposed flight software patches, non-rout
23、ine parameter and data/table modifications, and any change Provided by IHSNot for ResaleNo reproduction or networking permitted without license from IHS-,-,-affecting fault protection or Contingency/Safe Mode. 3. Predefined commands should be always used in preference to general memory loads for par
24、ameter updates. 4. A thorough flight software configuration management (CM) process, including regular readout and validation of the flight computer memory and maintenance of high-fidelity memory image in the ground testbed, must be in place and rigorously followed. 5. Ground operations should have
25、in place a procedure to quickly acquire stored telemetry and/or memory readout data upon discovery of a spacecraft fault. Spacecraft Design Weaknesses 6. Operation at an allowable hardware limit should not be interpreted by autonomous fault protection as a fault. 7. All Contingency and Safe Mode att
26、itudes need to be evaluated and designed to provide for thermal safety. Where the design cannot provide for thermal safety along with other health and safety concerns, contingency procedures should be developed as risk mitigation. 8. As a part of a fault response, key data should be captured and ret
27、urned autonomously to the ground at the next scheduled downlink, without the ground having to send a command. Key data includes fault detection “high water marks,“ data concerning the subsystem in which the anomaly occurred, etc.Lifetime Management Considerations 9. Each extended mission should be f
28、ormally and independently reviewed, across the board, including assessment of whether system-level fault protection parameters are appropriate given the state of the remaining complement of hardware. 10. The operational processes need to be routinely updated to accommodate the changing personnel and
29、 operational environment. Evidence of Recurrence Control Effectiveness: JPL has referenced this lesson learned as additional rationale and guidance supporting specific paragraphs in the Jet Propulsion Laboratory standard “Flight Project Practices, Rev. 6,” JPL DocID 58032, March 6, 2006. These parag
30、raphs are: Paragraph 5.7.8 (”Management Practices: Spares, Testbeds, and Models”), Paragraph 5.12 (“Management Practices: Project Staffing and Destaffing”), Paragraph 5.16 (Management Practices: Provided by IHSNot for ResaleNo reproduction or networking permitted without license from IHS-,-,-Reviews
31、”), Paragraph 6.3.2 (“Mission Operations: Process and Procedure”), Paragraph 6.3.3 (“Mission Operations: Flight Team”), Paragraph 6.3.4 (“Mission Operations: Spacecraft Health and Safety, and Performance Analysis”), Paragraph 6.11 (“Engineering Practices: Software Development”), Paragraph 6.15.2.3 (
32、“Engineering Practices: Configuration Management”), and Paragraph 6.18.10 (“Engineering Practices: Mission Operations System Development”) In addition, JPL has referenced this lesson learned supporting specific paragraphs in the JPL standard “Design, Verification/Validation and Operations Principles
33、 for Flight Systems (Design Principles),” JPL Document D-17868, Rev. 3, December 11, 2006. These paragraphs are: Paragraph 3.1.3 (“Mission Design: Protection of Critical Data”), Paragraph 4.1.3.2 (“Flight System Design: Design Robustness Protection Against Operator Errors”), Paragraph 4.3.3.4 (“Powe
34、r/Pyrotechnics Design: Power Generation Recovery from Loss of Power”), Paragraph 4.3.3.5 (“Power/Pyrotechnics Design: Power Generation Energy Storage”), Paragraph 4.3.3.6 (“Power/Pyrotechnics Design: Power GenerationSecondary (Rechargeable) Batteries”), Paragraph 4.4.1.1 (“Information System Design:
35、 Redundant Handling of Critical Data”), Paragraph 4.4.4.5 (“Information System Design: Commanding and Sequencing Commanding Modes”), Paragraph 4.4.6.4 (“Information System Design: Telemetry Visibility Visibility of Spacecraft State”), Paragraph 4.8.1.1 (“System Thermal Design: Design Tailored to Spe
36、cific Application”), Paragraph 4.8.2.5 (“System Thermal Design: Design Margin under Anomalous Conditions”), Paragraph 4.9.2.2 (“System Fault Protection Design: Fault Protection Response Flight System Safing”), Paragraph 4.9.3.1 (“System Fault Protection Design: Flight-Ground Interface In-Flight Comm
37、andability”), Paragraph 4.9.3.2 (“System Fault Protection Design: Flight-Ground Interface Visibility into Fault Protection State”), Paragraph 4.9.3.5 (“System Fault Protection Design: Flight-Ground Interface Fault Protection Margin Monitoring”), and Paragraph 4.9.3.7 (“System Fault Protection Design
38、: Flight-Ground Interface Visibility into Fault Response Activity”) Documents Related to Lesson: N/AMission Directorate(s): a71 Space Operationsa71 Sciencea71 Exploration SystemsAdditional Key Phrase(s): a71 Program Management.Configuration and data managementa71 Missions and Systems Requirements De
39、finition.a71 Missions and Systems Requirements Definition.Mission concepts and life-cycle planninga71 Missions and Systems Requirements Definition.Requirements critical to costing and cost Provided by IHSNot for ResaleNo reproduction or networking permitted without license from IHS-,-,-credibilitya7
40、1 Missions and Systems Requirements Definition.Review boardsa71 Systems Engineering and Analysis.a71 Systems Engineering and Analysis.Long term sustainability and maintenance planninga71 Engineering Design (Phase C/D).a71 Engineering Design (Phase C/D).Powera71 Engineering Design (Phase C/D).Softwar
41、e Engineeringa71 Engineering Design (Phase C/D).Spacecraft and Spacecraft Instrumentsa71 Mission Operations and Ground Support Systems.a71 Mission Operations and Ground Support Systems.Mission control Planninga71 Mission Operations and Ground Support Systems.Mission operations systemsa71 Safety and
42、Mission Assurance.a71 Safety and Mission Assurance.Review systems and boardsa71 Additional Categories.Communication Systemsa71 Additional Categories.Flight Equipmenta71 Additional Categories.Flight Operationsa71 Additional Categories.Ground Equipmenta71 Additional Categories.Ground Operationsa71 Add
43、itional Categories.Hardwarea71 Additional Categories.Payloadsa71 Additional Categories.Softwarea71 Additional Categories.SpacecraftAdditional Info: a71 Project: Mars Global SurveyorApproval Info: a71 Approval Date: 2007-10-30a71 Approval Name: ghendersona71 Approval Organization: HQProvided by IHSNot for ResaleNo reproduction or networking permitted without license from IHS-,-,-