1、Lessons Learned Entry: 1835Lesson Info:a71 Lesson Number: 1835a71 Lesson Date: 2007-10-23a71 Submitting Organization: JPLa71 Submitted by: David Oberhettingera71 POC Name: Bette Siegela71 POC Email: bette.siegelnasa.gova71 POC Phone: 202-358-2245Subject: Capture of Apollo Lunar Module Reliability Le
2、ssons Learned: Reliability Engineering Abstract: A July 2007 workshop attended by the former Grumman Corporations Apollo Lunar Module Reliability and Maintainability (R&M) Team and Constellation Program personnel traced the success of the Apollo Lunar Module (LM) to reliability features that were lo
3、cked into an early stage of LM design. The LM prime contractor (Grumman Corporation) and NASA shared responsibility for system reliability, provided for early Reliability Engineering involvement in evaluating design alternatives at the cross-Apollo system level, placed great emphasis on identificati
4、on and elimination of critical single point failures, provided for extensive design redundancy, conducted parallel development of alternative technologies, tested critical hardware beyond qualification levels to the point of failure, and performed rigorous root cause analysis of failures. The Grumma
5、n retirees also recommended active management of design margins, providing the lunar lander with generous instrumentation and telemetry capabilities, and furnishing a strong Lander advocate during CEV design.Description of Driving Event: As part of the Constellation Programs review of human spacefli
6、ght lessons learned, NASA hosted a July 20, 2007 panel discussion with a group of engineers who were members of the Apollo Lunar Module Reliability and Maintainability (R&M) Team. The team members are retired employees of Grumman Corporation, the prime contractor for the Lunar Module (LM). One set o
7、f lessons learned that was discussed focused on the Apollo approach to reliability engineering (Reference (1): Provided by IHSNot for ResaleNo reproduction or networking permitted without license from IHS-,-,-The Apollo approach of shared NASA/contractor responsibility for achieving LM reliability (
8、Reference (2) strengthened efforts to incorporate reliability features into the design. As indicated by Figure 1, reliability was infused into the design relatively early in the project life cycle, with part of the achieved reliability captured by design requirements by the release date of the NASA
9、Request for Proposal (RFP). Because NASA issued a brief RFP that stated only functional requirements, and the Grumman program plan (Reference (3) accepted by NASA committed only to these high-level requirements, Grumman retained substantial freedom to make LM design tradeoffs. Had NASA allowed disci
10、pline experts to impose detailed design requirements in the RFP without a full understanding of system-level impacts, some requirements might have detracted from mission success and crew safety. Figure 1. Apollo LM reliability growth (approximation performed in 2007 for use in Reference (1)The Syste
11、ms Reliability Group at Grumman placed heavy emphasis on assuring their early involvement in evaluating design alternatives, such as allocating mass to fuel vs. to payload and allocating functions to hardware vs. software. The LM system had only 10,000 lines of code, and the panel discussion suggest
12、ed that functional requirements implemented in software instead of hardware might decrease weight at the cost of reliability. Decisions on the system configuration were heavily influenced by early weight vs. reliability trade studies that made effective use of flight simulation, and used math models
13、 to compare configurations. In retrospect, if NASA and their contractors had made these trades at the integrated system level (Reference (4), they could have obtained the best reliability increase per pound added to the Apollo booster/Command Module/LM system. For example, if a few extra pounds for
14、an additional battery had been added to the Apollo LM, it might have provided the power needed to make the LM a more comfortable lifeboat during the Apollo 13 return trip. Redundancy was employed extensively in the effort to minimize the number of potential single point failures. Apollo LM designers
15、 in cross-functional, reliability-oriented teams sought to provide Provided by IHSNot for ResaleNo reproduction or networking permitted without license from IHS-,-,-extensive redundancy by dissimilar means. For example, the secondary abort system employed hardware and software that was different tha
16、n that used in the primary guidance system. Component redundancy (e.g., use of dual valve regulators) was employed where feasible. Parallel technology development, such as the simultaneous development of both fuel cells and batteries as alternative power sources, was also used to mitigate the system
17、 reliability risk. Because the test program design was based on a lunar environment that was unknown and a mission profile that was then uncertain, it evolved over time. Developmental flight hardware was stressed to failure, well beyond the environmental uncertainty factor of 1.5 used to set the qua
18、lification test levels. This provided an additional environmental margin that accommodated design changes later in the LM project. In contrast, International Space Station (ISS) developmental hardware was not tested beyond qualification levels, and during ISS operations certain necessary flight orie
19、ntations exceeded these design limits (Reference (5). The actual Apollo LM flight hardware was tested to flight environmental levels with continuous operation to screen design and workmanship failures. All failures were subjected to very rigorous root cause analysis, and corrective action plans were
20、 approved at the NASA level. Although reliability prediction per MIL-HDBK-217 was then in common use by aerospace engineers, the methodology provided very limited benefits to the LM project. The designers were driven by the need to eliminate single-point failures (SPFs) that could impede mission suc
21、cess or harm the crew. Hence, changes were required for designs that contained SPFs even if a calculation predicted a low probability of failure. Failure Mode and Effects Analysis (FMEA) performed at the system and functional levels was very effective in identifying failure modes (including failures
22、 of other contractors interfacing hardware) and evaluating design modifications. In hindsight, though, performance of FMEAs at an even lower tier (i.e., the subcontractor level) would have revealed problems (like solder balls floating in switches) earlier. But careful analysis would not have suffice
23、d without tenacity by the Grumman Systems Reliability Group in forging the necessary design modifications to ensure mission success and crew safety. The Apollo 13 near-disaster revealed the importance of obtaining operations data in real time to support safety-related decision making. For example, t
24、he triggering of a CO2 alarm at the warning system engineers console in Houston alerted the mission to the need to take extraordinary measures to save the crew. Mission success requires mission operations staff to expect the unexpected. This requires the generous allocation of flight instrumentation
25、 and telemetry resources to the system for timely identification of problems during mission operations. In addition, real time simulations conducted on the ground to evaluate proposed corrective actions proved their worth. References: 1. Gerry Sandler (Ret.), Presentation on Apollo/Lunar Module Reli
26、ability, Apollo Lunar Module Reliability and Maintainability Team, Apollo Lunar Lander Team Lessons Learned Workshop, July 20, 2007.2. “Capture of Apollo Lunar Module Reliability Lessons Learned: Program/Engineering Management,“ Lessons Learned No 1806, NASA Engineering Network, September 25, 2007.P
27、rovided by IHSNot for ResaleNo reproduction or networking permitted without license from IHS-,-,-3. “LEM Program Plan,“ Grumman Aircraft Engineering Corporation, Report No. LPL 13-1A, July 1, 1964.4. David Oberhettinger telephone conversations with Gerry Sandler, Apollo Lunar Module Reliability and
28、Maintainability Team, October 4-5, 2007.5. Dr. Bette Siegel, “Lessons Learned from the Apollo Lunar Module Reliability Team Meeting,“ NASA Exploration Systems Mission Directorate, July 20, 2007.Lesson(s) Learned: The Constellation lunar lander program faces challenges similar to those faced by the A
29、pollo program 45 years ago in terms of achieving reliability and mitigating crew safety-critical and mission-critical risks.Recommendation(s): 1. Lock system reliability into the early design such that the test program is relied upon for screening and verification. Emphasize identification and elimi
30、nation of safety-critical and mission-critical single point failures, independent of numerical estimates of the failure probability. Assure that a strong “systems reliability group“ is involved in evaluating alternative designs very early in the project life cycle. This early involvement should exte
31、nd down to the subcontractor level, with subcontractors providing inputs to functional and system level reliability analyses. Assure that system requirements are assessed for their reliability impact prior to release of the system-level RFP.2. Evaluate design alternatives and conduct trade studies a
32、t the system level to obtain an optimal overall design. For example, it may be necessary to overcome organizational barriers between projects within the Constellation program to assess whether extra mass should be allocated to the Crew Exploration Vehicle (CEV) launch vehicle, the CEV, or the lunar
33、lander.3. Provide a primary and a redundant backup where feasible, preferably by dissimilar means, for safety-critical and mission-critical systems. Trade functionality between the lander, the capsule, and the service module to get the optimal integrated solution. For subsystems with a low technolog
34、y readiness level (TRL) early in project development, mitigate the risk to reliability by funding the parallel development of alternatives.4. To accommodate future Constellation lunar lander system design changes and unanticipated flight configurations, test critical hardware beyond its qualificatio
35、n test levels until it fails. This allows a more accurate picture of true margin- important for a new, complex system with many interfaces. This characterization can be used to address unknown problems and future tradeoffs. Consider incurring the additional cost of testing the actual flight hardware
36、 to environmental acceptance test levels. Establish success criteria for every test and every flight. Consider testing the Constellation lunar lander systems for the range of environments consistent with different lunar landing locations. Assure that contractors employ rigor in root Provided by IHSN
37、ot for ResaleNo reproduction or networking permitted without license from IHS-,-,-cause analysis of failures, and provide NASA oversight over the evaluation and implementation of corrective actions.5. Actively manage performance margins so that the design margin can be allocated optimally. For examp
38、le, provide a wealth of flight instrumentation and telemetry resources to assist ground controllers in timely identification of problems during mission operations and to support Constellation onboard maintenance. 6. To achieve lunar lander reliability under the Constellation program, provide a stron
39、g Lander advocate during the design of the CEV.Evidence of Recurrence Control Effectiveness: JPL has referenced this lesson learned as additional rationale and guidance supporting Paragraph 5.4.5 (“Management Practices: Project Organization, Roles and Responsibilities, Internal Communications, and D
40、ecision-Making“), Paragraph 6.4.3 (“Engineering Practices: System Engineering“), Paragraph 6.8.3 (“Engineering Practices: Flight System Fault Tolerance/Redundancy“), Paragraph 6.13.1.2 (“Engineering Practices: Design and Verification for Environmental Compatibility“), Paragraph 6.13.9.1 (“Engineerin
41、g Practices: Design and Verification for Environmental Compatibility- Environmental Qualification and Flight Acceptance Testing“), Paragraph 7.2 (“Reliability Engineering“), and Paragraph 7.6.8 (“Safety and Mission Assurance Practices: Problem Reporting“) in the Jet Propulsion Laboratory standard Fl
42、ight Project Practices, Rev. 6, JPL DocID 58032, March 6, 2006. In addition, JPL has referenced it supporting Paragraph 4.1.3.1 (“Flight System Design: Design Robustness- Single Failure Tolerance“), Paragraph 4.1.3.2 (“Flight System Design: Design Robustness- Protection Against Operator Errors“), Pa
43、ragraph 4.1.4.1 (“Flight System Design: Design Margins During Development- Design Margin Sizing“), Paragraph 4.3.3.7 (“Power/Pyrotechnics Design: Power Generation- Primary (Non-Rechargeable) Batteries), Paragraph 4.9.3.6 (System Fault Protection Design: Flight-Ground Interface- Fault Reconstruction
44、Data“), Paragraph 4.12.1.1 (“Flight Electronics Hardware System Design: General- Design Partition“), Paragraph 8.3.7.1 (“System Assembly, Integration and Test: System Environmental Verification- Environmental Exposure Fault Isolation“), Paragraph 9.5.1 (“Flight System Flight Operations Design: Opera
45、ting Margins- Operating Margins for Real Time Operations“) in the JPL standard Design, Verification/Validation and Operations Principles for Flight Systems (Design Principles), JPL Document D-17868, Rev. 3, December 11, 2006.Documents Related to Lesson: a71 NASA-STD-8729.1, “Planning, Developing and
46、 Managing an Effective Reliability & Maintainability Program”a71 MIL-STD-1543, “Reliability Program Requirements for Space and Launch Vehicles”a71 MIL-STD-2070, “Procedures for Performing a Failure Mode, Effects & Criticality Analysis Provided by IHSNot for ResaleNo reproduction or networking permit
47、ted without license from IHS-,-,-for Aeronautical Equipment”a71 MIL-HDBK-217, “Reliability Prediction of Electronic EquipmentClick here to download document. Mission Directorate(s): a71 Exploration SystemsAdditional Key Phrase(s): a71 Program Management.a71 Program Management.Contractor relationship
48、sa71 Program Management.Cross Agency coordinationa71 Missions and Systems Requirements Definition.a71 Missions and Systems Requirements Definition.Level 0/1 Requirementsa71 Missions and Systems Requirements Definition.Mission concepts and life-cycle planninga71 Missions and Systems Requirements Defi
49、nition.Vehicle conceptsa71 Engineering Design (Phase C/D).a71 Engineering Design (Phase C/D).Crew Survival Systemsa71 Engineering Design (Phase C/D).Environmental Control and Life Support Systemsa71 Engineering Design (Phase C/D).Human Health/Flight Medicinea71 Engineering Design (Phase C/D).Lander Systemsa71 Engineering Design (Phase C/D).Powera71 Engineering Design (Phase C/D).