1、ANSI INCITS TR-11-1992(formerly ANSI X3/TR-11-1992)Information Processing SystemsTechnical ReportInformation ResourceDictionary System (IRDS)Support forNaming ConventionVerification (NCV)X3s Technical Report Series This Technical Report is one in a series produced by the American National Standards
2、Committee, X3, Information Processing Systems. The Secretariat for X3 is held by the Computer and Business Equipment Manufacturers Association (CBEMA), 1250 Eye Street NW, Suite 200, Washington, DC 20005. As a by-product of the standards development process and the resources of knowledge devoted to
3、it, X3 from time to time produces Technical Reports. Such Technical Reports are not stan- dards, nor are they intended to be used as such. X3 Technical Reports are produced in some cases to disseminate the technical and logical con- cepts reflected in standards already published or under development
4、. In other cases, they derive from studies in areas where it is found premature to develop a standard due to a still-changing technology, or inappropriate to develop a rigorous standard due to the existence of a number of viable options, the choice of which depends on the users particular requiremen
5、ts. These Technical Reports, thus, provide guidelines, the use of which can result in greater consistency and coherence of information processing systems. When the draft Technical Report is completed, the Technical Committee approval process is the same as for a draft standard. Processing by X3 is a
6、lso similar to that for a draft standard. Published by American National Standards Institute 11 West 42nd Street, New York, New York 10036 Copyright 0 1992 by American National Standards Institute All rights reserved. No part of this publication may be reproduced in any form, in an electronic retrie
7、val system or otherwise, without prior written permission of the publisher. Printed in the United States of America APSI C992/60 Foreword At the time it approved this Technical Report, the Technical Committee X3H4 on Information Resource Dictionary System had the following mem- bers: Anthony J. Wink
8、ler, Chair Bruce Bargmeyer John Bestwick Roger Burkhart Chi Chen Twyla Courtot Edd Cutway Richard Desmond Alan Goldfine Beverly Hacker Mark Jones Douglas Mann Dana Marks Sandra Perez Mike Reynolds Cliff Sundberg Manoo Urs Mel Bing (Alt.) Jim Fulton (Alt.) Bob Hodges (Alt.) Julia McCreary (Alt.) Judi
9、th Newton (Alt.) Burt Parker (Alt.) Woody Pidcock (Alt.) Jim Pipher (Alt.) Mohan Prabandham (Alt.) Gary Rokey (Alt.) Anthony Sarris (Alt.) John Sowa (Alt.) Task Group X3H4.4, IRDS Administration, had the following members: Judith Newton, NIST, Chair Twyla Courtot, MITRE Corp., Vice-Chair file or rec
10、ord names were checked for adherence to certain format or syntax. Because of space limitations and simplistic name validation mechanisms, the need for sophisticated naming conventions was minimal. Systems were built by developers as stand-alone applications with little or no direct access by end use
11、rs. Technological advances have moved users closer to development. The 80-column card has given way to tapes and disks with more space for data representation. Hardware architectures and capabilities have improved performance, throughput, and storage capability. Distribution of applications and data
12、 across multiple homogeneous or heterogeneous platforms and locations, coupled with these enhanced capabilities, has made familiar problems worse and created new ones in identifying and locating data. Now the same data structure is used in local and distributed systems. New constraints are placed on
13、 identifying redundant data and ensuring that semantic properties are preserved. Organizations have increased in size and the amount of data being processed has astronomically increased. To support this automation and distribution there is now a requirement to understand and integrate systems within
14、 and across organizations. The need for standard names becomes apparent, and naming convention methodologies have been developed to assist in unambiguously identifying data. 6 Within organizations today there is increased communication and use of automated data. Organizations now manage information
15、8s 8 corporate resource just like the application systems that process the data. The need for data sharing and management has been institutionally recognized. Data sharing occurs both horizontally and vertically within 8n organization and extends outside the organization 8s well. The importance and
16、problem of communicating and understanding data with the semantics intact has received much attention. Data naming and name Verification can aSSiSt significantly with the preservation of semantic data integrity. To 2. Sets, and membership in sets, of the name or any of its components (lexical, seman
17、tic, and syntactical); and, 3. Constraints on the size of the name or any of its components (lexical). As indicated, these categories of naming characteristics concern lexical, semantic, and syntactical rules. Each of the categories may contain one or more specific rule types. Specific rule types as
18、sociated with each category are provided below. Each rule type may have one or more converse rules. A rule and its converse are grouped together in the list below for clarity. Converse rules should be treated as independent rule types. The categories and rule types associated with them are not mutua
19、lly exclusive. When a naming convention is developed, the various rules are specified in some combination that is not logically contradictory. Category 1: Positioning Type la: A word or symbol is required to be placed in a specific relative or absolute position, order, or sequence within the name, e
20、.g., “a keyword or symbol must always appear in the 1st position of the name“, “the object or noun of the phrase must always exist and must be positioned immediately preceding the end-of-phrase delimiter“, or “the component must begin with an alphabetic character or national special character, e.g.,
21、 “$“, and be immediately followed by an alphabetic character. 14 Type lb: Type lc: Type Id: Type le: Category 2: Sets Type 2a: Type 2b: Type 2c: Type 2d: Type 2e: Type 2f: Type 2g: Type 2h: A word or symbol is not required to be placed in a specific relative or absolute position, order, or sequence
22、within the name. A space or designated symbols may be required to separate or delimit specified components of a name, e.g., “an asterisk will precede the keyword, hyphens will separate words of a term, all other words and designated symbol sets will be separated by an underline“. No separators or de
23、limiters are specified, default is a space between all words or designated components. The relationship of the name to one or more other entities in the metadata structure may require incorporation of some form of parent entity or configuration/version identification, e.g., record code included as a
24、 component of each of its contained field names. A name, word, or symbol is required to be a member of a specific, designated set(s), e.g., must match keyword list, or must be a specified connector symbol. A name, word or symbol is required not to be a member of a specific, designated set(s), e.g.,
25、“stop“ list, “dirty word“ list, or “unique name“ list. A name, word or symbol is not required to be a member of a specific, designated set(s). Set membership is controlled, e.g., a keyword set is established for selected name components and changes to the set must be approved by an individual specif
26、ied by name or position. Set membership is not controlled. A component of a name may be restricted to words of a specific form or part of speech, e.g., no plural nouns, or no articles. A component of a name may be any word of a language. A component may be a member of one and only one component type
27、 set. 15 Type 21: A component may be a member of more than one component type set. Category 3: Size Type 3a: The name, word, or symbol set contains a specified minimum, constant, or maximum number of symbols, e.g., “a name may be no more than 30 characters“, or “a mnemonic name or code must always b
28、e 6 characters long“. Type 3b: The name, word, or symbol set contains no specified minimum, constant, or maximum number of symbols. Type 3c: A shortened form of one or more word components of a name may be required to be substituted for the original word(s). These may be in the form of an abbreviati
29、on, contraction, truncation, or acronym. Rules or algorithms may be specified for the purpose of name and component shortening. Type 3d: There is no requirement for shortening a name or word component. A discussion of name verification and validation as related to these categories and rules is provi
30、ded in Annex C. 16 4.0 NCVR Features Features required to be provided and supported by the NCVM are listed below. These features were derived from the analysis done to identify naming convention paradigms, task group experience in naming, naming verification support currently available in name verif
31、ication software, and interaction with users. User features and requirements are documented more fully in Annex F. For verification, these features have been mapped to the requirements listed in clause 1.3. Some features support more than one requirement, and some requirements are supported by more
32、than one feature. The requirements that relate to each feature are provided in the parentheses at the end of each numbered feature. 4.1 Required Features 1. Specific naming conventions and name verification rules shall be external to the NCVM. The NCVM shall support the definition and maintenance of
33、 naming convention rules and the verification of IRDS names according to the rules defined by an organization. The NCVM shall not be limited to a predefined set of naming convention or name verification rules. (Requirement 2). 2. The NCVM shall be able to detect and reject inconsistencies between th
34、e naming rules defined to it. (Requirement 14). 3. The NCVM shall verify correct names and identify nonstandard names, abbreviations, terms, etc. either for names existing in the IRD or for name(s) entered by a user. (Requirement 1). 4. Ihe NCVM shall assist in the generation of allowable names from
35、 a given or proposed name based on a set of consistent rules. (Requirements 4, 8, 15). Standard names, including access and descriptive names, shall be suggested by the NCVM when a proposed standard name is determined to be incorrect according to the rules described to the NCVM for standard names. A
36、lternate names shall be generated from standard names where the alternate name is based on a set of naming rules different from those used to verify the standard name. Alternate names can include programming names (e.g., COBOL names) as well as other more user-oriented names. 17 5. The NCVH shall an
37、alyze names based on content of components and relative and absolute format arrangements. It shall also analyze the semantic content of connectors. The NCVM shall associate word types within a given context. (Requirements 5, 6, 7, 8, 9). Since naming convention paradigms separate names into parts an
38、d assign meaning to those parts, the NCVM shall support the verification of names based on components and words arranged in a particular order for various contexts. This is the lowest level of semantic analysis necessary to verify relationships between name components. The NCVM shall identify compon
39、ents by both absolute and relative position (context) in a name. 6. The NCVM shall provide thesaurus capability to support name generation and semantic identification. (Requirements 12, 13, and 15). 7. The NCVM shall support synonym identification of name components. This shall include non-exact mat
40、ches that incorporate identical terms presented in different order. (Requirements 12, 13). The NCW shall identify and maintain synonyms and near-synonyms used in components and words in the given name structure. 8. The NCVM shall allow for different rules for different object types and life cycle ph
41、ases. Thus, it shall support various methods of naming across different data types. (Requirement 4). 9. The NCVM shall support rule maintenance. (Requirement 3). The NCVM shall support adding, changing, and deleting of rules according to the security and permissions established by the user organizat
42、ion. 11. The NCVM shall provide the capability to enter names directly to the dictionary and check for duplicates from the dictionary. (Requirement 1). 12. The NCVM shall prohibit entry of duplicate names within an object type. (Requirement 1). 13. The NCVM shall provide for lexical, semantic, and s
43、yntactic checks. (Requirement 5). 14. The NCVM shall have the capability to automatically insert abbreviations or acronyms, or automatically insert fully spelled out words or terms from abbreviations or acronyms. (Requirement 11). 15. The NCVM shall have the capability to tailor display formats for
44、metadata relevant to the organization and support the application of the naming convention. (Requirement 10). 18 4.2 Possible Additional Capabilities Capabilities that should be considered for the NCW to support, but not required to provide, are those that support additional user interface, browse,
45、data thesaurus, security, and name analysis. The capabilities identified are given below, grouped by the capability area: User Interface: l Provide interactive and batch modes for NCVM functions l Provide context sensitive help l Provide user-installation default setting for name verification on or
46、off Browse: 8 Provide a browse capability for all IELDS objects, including names, rules, thesaurus-terms, thesaurus-categories, word types Thesaurus: l Search, retrieve, and report names, rules, and word types using thesaurus terms and thesaurus categories l Provide a thesaurus facility for all IRDS
47、 objects Security: l Invoke verification based on user/installation settings or defaults for access security l Provide controlled access to names, rule, words, word data, and thesaurus Name Generation: l Provide the capability to generate names from a given definition expressed in some formalism, pr
48、edefined language, or structured or unstructured text in batch or interactive mode Name Analysis: l Provide the capability to analyze a selected set of names for synonyms and report findings 19 5.0 Recommendations This report does not present any mechanism for implementing the NCVM. Implementation i
49、ssues need investigation and analysis, and will be discussed as the NCVM is further developed. It is the consensus of Task Group X3H4.4, Naming Convention Verification, that the NCVM is an essential tool for maintaining the integrity and coherence of the IRD. A standard for it should be developed. The Task Group recommends that an effort be established to address design specifications for an NCVM standard. 20 Annex A IRDS Names and Naming Rules ANS X3.138-1988 defines three kinds of names for entities in an IRD: access names, descriptive names, and alternate names. Of the three, access names