1、Designation: E2891 13Standard Guide forMultivariate Data Analysis in Pharmaceutical Developmentand Manufacturing Applications1This standard is issued under the fixed designation E2891; the number immediately following the designation indicates the year oforiginal adoption or, in the case of revision
2、, the year of last revision. A number in parentheses indicates the year of last reapproval. Asuperscript epsilon () indicates an editorial change since the last revision or reapproval.1. Scope1.1 This guide covers the applications of multivariate dataanalysis (MVDA) to support pharmaceutical develop
3、ment andmanufacturing activities. MVDA is one of the key enablers forprocess understanding and decision making in pharmaceuticaldevelopment, and for the release of intermediate and finalproducts.1.2 The scope of this guide is to provide general guidelineson the application of MVDA in the pharmaceuti
4、cal industry.While MVDA refers to typical empirical data analysis, thescope is limited to providing a high level guidance and notintended to provide application-specific data analysis proce-dures. This guide provides considerations on the followingaspects:1.2.1 Use of a risk-based approach (understa
5、nding theobjective requirements and assessing the fit-for-use status),1.2.2 Considerations on the data collection and diagnosticsused for MVDA (including data preprocessing and outliers),1.2.3 Considerations on the different types of data analysisand model validation,1.2.4 Qualified and competent pe
6、rsonnel, and1.2.5 Life-cycle management of MVDA.1.3 This standard does not purport to address all of thesafety concerns, if any, associated with its use. It is theresponsibility of the user of this standard to establish appro-priate safety and health practices and determine the applica-bility of reg
7、ulatory limitations prior to use.2. Referenced Documents2.1 ASTM Standards:2C1174 Practice for Prediction of the Long-Term Behavior ofMaterials, Including Waste Forms, Used in EngineeredBarrier Systems (EBS) for Geological Disposal of High-Level Radioactive WasteE178 Practice for Dealing With Outlyi
8、ng ObservationsE1355 Guide for Evaluating the Predictive Capability ofDeterministic Fire ModelsE1655 Practices for Infrared Multivariate QuantitativeAnalysisE1790 Practice for Near Infrared Qualitative AnalysisE2363 Terminology Relating to Process Analytical Technol-ogy in the Pharmaceutical Industr
9、yE2474 Practice for Pharmaceutical Process Design UtilizingProcess Analytical TechnologyE2476 Guide for Risk Assessment and Risk Control as itImpacts the Design, Development, and Operation of PATProcesses for Pharmaceutical ManufactureE2617 Practice for Validation of Empirically Derived Mul-tivariat
10、e Calibrations2.2 ICH Standards:3ICH-Endorsed Guide for ICH Q8/Q9/Q10 Implementa-tion ICH Quality Implementation Working Group Pointsto Consider (R2)ICH Q2(R1) Validation of Analytical Procedures: Text andMethodology3. Terminology3.1 DefinitionsCommon term definitions can be found inTerminology E236
11、3 for pharmaceutical applications and someterms can be found in other standards and are cited when theyare mentioned.4. Significance and Use4.1 A significant amount of data is being generated duringpharmaceutical development and manufacturing activities. Theinterpretation of such data is becoming in
12、creasingly difficult.Individual examination of the univariate process variables isrelevant but can be significantly complemented by multivariatedata analysis (MVDA). Such methodology has been shown tobe particularly efficient at handling large amounts of data from1This guide is under the jurisdictio
13、n of ASTM Committee E55 on Manufactureof Pharmaceutical Products and is the direct responsibility of Subcommittee E55.01on PAT System Management, Implementation and Practice.Current edition approved Nov. 1, 2013. Published November 2013. DOI:10.1520/E2891-13.2For referenced ASTM standards, visit the
14、 ASTM website, www.astm.org, orcontact ASTM Customer Service at serviceastm.org. For Annual Book of ASTMStandards volume information, refer to the standards Document Summary page onthe ASTM website.3Available from International Conference on Harmonisation of TechnicalRequirements for Registration of
15、 Pharmaceuticals for Human Use (ICH), ICHSecretariat, c/o IFPMA, 15 ch. Louis-Dunant, P.O. Box 195, 1211 Geneva 20,Switzerland, http:/www.ich.org.Copyright ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States1multiple sources, summarizing complex in
16、formation intomeaningful low dimensional graphical representations, identi-fying intricate correlations between multivariate datasets tak-ing into account variable interactions. The output from MVDAwill generate useful information that can be used to enhanceprocess understanding, decision making in
17、processdevelopment, process monitoring and control (including prod-uct release), product life-cycle management and continualimprovement.4.2 MVDA is a widely used tool in various industriesincluding the pharmaceutical industry. To generate a validoutcome, MVDA should contain the following components:
18、4.2.1 A predefined objective based on a risk and scientifichypothesis specific to the application,4.2.2 Relevant data,4.2.3 Appropriate data analysis techniques, including con-siderations on validation,4.2.4 Appropriately trained staff, and4.2.5 Life-cycle management.4.3 This guide can be used to su
19、pport data analysis activitiesassociated with pharmaceutical development andmanufacturing, process performance and product quality moni-toring in manufacturing, as well as for troubleshooting andinvestigation events. Technical details in data analysis can befound in scientific literature and standar
20、d practices in dataanalysis are already available (such as Practices E1655 andE1790 for spectroscopic applications, Practice E2617 formodel validation and Practice E2474 for utilizing processanalytical technology).5. Concepts of MVDA Model and MVDA Method5.1 When implementing MVDA it is important to
21、 under-stand the differentiation between a multivariate model and amultivariate method. This is especially true as an MVDAapplication reaches the validation stage.5.2 MVDA Model:5.2.1 As defined in Practice C1174, a model is a simplifiedrepresentation of a system or phenomenon with multiplevariables
22、 based on a set of hypotheses (assumptions, data,simplifications, or idealizations, or a combinations thereof) thatdescribe the system or explain the phenomenon, often ex-pressed mathematically. In the context of this guidance theterm MVDAmodel is to be taken in a broad sense covering, forexample mu
23、ltivariate regression as well as latent variable-based techniquessuch as, but not limited to, Principal Com-ponent Analysis (PCA) and Partial Least Squares (PLS)Regression. These models often relate observational data to aknown property or set of properties from a process. Themathematical relationsh
24、ip is established for a sufficient numberof casespreferably derived from experimental designs. Themodel can then be applied to a similar set of observational datain order to predict the targeted property/properties.5.2.2 MVDA is not limited to such multivariate calibrationsand predictions, and simil
25、ar considerations as the ones de-scribed in this guidance are applicable to direct and indirectcalibration, as well as PCA-based approaches used for examplefor exploratory data analysis.5.3 MVDA Method:5.3.1 The MVDA method uses the output from the MVDAmodel to define the targeted and predefined pro
26、cess character-istic of interest. The MVDA model is one component of thebroader concept that is an MVDAmethod. Such method shouldtypically be characterized by the collection of data, the inputdata to the calculation, the data analysis, and some potentialtransformation from the MVDA model output to g
27、enerate thepre-defined MVDA method characteristic of interest. (See Fig.1.)5.3.2 Note that an MVDA method can incorporate multipleMVDA models (for example, across multiple unit operations,from multiple pieces of equipment, etc.) that can be running inparallel or feeding sequentially into one another
28、 to provide theFIG. 1 Relationship Between an MVDA Method and an MVDA ModelE2891 132pre-defined MVDA method output. The validation of theMVDA model and the MVDA method are two differentactivities. Section 9 of this guideline provides an overview ofthe MVDA model validation. The validation of an MVDA
29、method should follow the same overarching principles as forany method validation, such as the ones described in ICHQ2(R1).5.4 Two-Phase Nature of MVDA:5.4.1 Data analysis usually, but not always, has two phases.In predictive analysis, the first phase is the creation of a modelfrom acquired data with
30、 a corresponding known property, andthe second phase is the application of the model to newlyacquired data to predict a value of the property. The first-phaseanalysis is usually called a multivariate calibration for aregression process or training for a learning process. Theemphasis is usually on th
31、e model building phase in practice:how to design cases properly, how to process the data to builda model, and how to test the model to see whether the modelis fit for use. The model prediction phase, however, should beemphasized equally. A valid model does not always generate avalid result; it will
32、generate a valid result only if the input datais valid too. It is important to screen the data and monitor theprediction diagnostics when using the model for prediction.Such diagnostics are often referred to as residual and scorespace diagnostics or inner/outer model diagnostics.5.4.2 In tracking an
33、d trending analysis, the first phase is toestablish data analysis parameters, trending limits, or a crite-rion for the end point of trajectory tracking. A model may becreated in the first-phase trending analysis. The second phase ispredicting the new values based on the established parameterset (inc
34、luding a possible model) and assessing the trajectorybased on the established criteria.6. Risk-Based Approach for MVDA6.1 A risk-based approach requires consideration of twoaspects: the risk associated with the use of MVDA for aspecific objective and the justifications and rationales duringthe data
35、analysis to ensure the model is fit for use. Aspects ofgeneral risk assessment and control are described in PracticeE2476 and more specific model considerations are discussed inICH-Endorsed Guide for ICH Q8/Q9/Q10 Implementation.6.2 The risk level is considered high when the data analysisis an integ
36、ral part of the control strategy, is used directly for theproduct or intermediate product release, or is used to directlycontrol the process. The risk is considered low when the outputof the data analysis does not have significant impact on theproduct quality or the assessment of the product quality
37、.6.3 In assessment of fitness for use of data analysis, threeaspects should be considered:6.3.1 Criteria for Acceptable Data AnalysisCriteria forthe data analysis are defined by user requirements and projectobjectives.6.3.2 Data SourceAppropriate and relevant data should becollected and used in MVDA
38、.6.3.3 Data Analysis Practice (Technique andProcedure)In data analysis practice, numerous options areavailable and different options may generate similar results, allof which may be deemed fit for use. The data analysis processis an iterative approach; in case of an unsatisfactory result, adifferent
39、 data analysis technique may be used or it may benecessary to obtain additional data and/or data of higherquality.7. Data Collection and Diagnostics7.1 Relevant data properly representing all factors impact-ing the MVDAobjective should be used for data analysis. Datagathered from various sources sho
40、uld be screened for errors,appropriate data preprocessing should be used, and data shouldbe screened for outliers. All processing of data, exclusion ofoutliers, selection of samples or variables, or both, and otheranalysis parameters need to be justified and documented.7.2 Data Source:7.2.1 Dependin
41、g on the MVDA-defined objective, the datacould come from designed experiments (DOE) or from routinedevelopment and manufacturing processes, or both. Dataoriginating from a DOE on input/process parameters hasinherent variation (special cause variability), while data ob-tained from routine operations
42、may reflect smaller variationwithin the acceptable operational ranges, tighter than rangesstudied during process development (common cause variabil-ity). The data collected from a routine process may be used fortrending, process monitoring, identification of atypical behav-ior but rarely for predict
43、ive analysis. A predictive model builtfrom the data that has small variation will typically have a verysmall range limited by the combination of specification,constrained incoming material variation and routine processparameter variation (operating ranges). Model diagnosticsshould be used to ensure
44、the model is predicting a meaningfulresult. Often, intentionally induced variations, preferably fol-lowing a DOE, are created so that the data with a largervariation range is used as part of the training set to build themodel.7.2.2 Data can be continuous, discrete, or categorical andfrom multiple so
45、urces. The most common sources are input/raw material properties, process parameters, in situ/PAT dataand intermediate/finished product properties. Data should begathered with acceptable quality (free of any obvious human ormachine errors but properly representing a typical noise levellikely to be p
46、resent in such data), with appropriate significantfigures. Outlier detection is strongly recommended (see 7.4).7.2.3 Data review is highly recommended and should bealigned with the risk level identified for the MVDA activity.Appropriate documentation of the data review activity shouldbe available as
47、 part of the model development and modelmaintenance activities.7.3 Data Preprocessing:7.3.1 Data preprocessing (or pretreatment) is a critical stepin the implementation of any MVDA application. The ap-proach chosen in the preprocessing of the data may have asignificant impact on the output of the mu
48、ltivariate analysis andshould be considered carefully. The preprocessing of the datashould aim at reshaping the data structure to enhance the keyfeatures targeted by the MVDA objectives. Appropriate datapreprocessing depends on the nature of the data, the MVDAtechnique used, and the purpose of the d
49、ata analysis. MultipleE2891 133preprocessing steps, or chained preprocessing, can sometimesbe applied to achieve the desired objective but should beconsidered carefully, particularly as the order chosen for theindividual preprocessing steps is likely to have significantimpact on the data analysis outcome. It may take severaliterative cycles to optimize the preprocessing steps to ensurethe necessary yet sufficient level of preprocessing is applied tothe data set to enable the MVDA model objectives to beachieved.7.3.2 Even though preprocessing can reduce or eliminatesome unwant