1、BRITISH STANDARD BS 7237:1990 IEC 559:1989 Specification for Binary floating point arithmetic for microprocessor systemsBS7237:1990 This British Standard, having been prepared under the directionof the Information Systems Technology Standards Policy Committee, was publishedunder the authority ofthe
2、Board of BSI and comes intoeffect on 31 July 1990 BSI 07-1999 The following BSI references relate to the work on this standard: Committee reference IST/6 Draft for comment 89/65509 DC ISBN 0 580 18114 6 Committees responsible for this British Standard The preparation of this British Standard was ent
3、rusted by the Information Systems Technology Standards Policy Committee (IST/-) to Technical Committee IST/6, upon which the following bodies were represented: Association for Payment Clearing Services British Computer Society British Telecommunications plc Business Equipment and Information Technol
4、ogy Association Department of Trade and Industry (Information Technology Division) Department of Trade and Industry (National Physical Laboratory) Electricity Supply Industry in England and Wales Electronic Engineering Association HM Treasury (Central Computer and Telecommunications Agency) Informat
5、ion Technology Users Standards Association Inter-Universities Computing Committee Joint Network Team LAMSAC National Computing Centre Ltd. National Health Service OFTEL (Office of Telecommunications) Post Office Telecommunication Engineering and Manufacturing Association Telecommunications Managers
6、Association User Standards Forum for Information Technology (Institute of Data Processing Management) Amendments issued since publication Amd. No. Date CommentsBS7237:1990 BSI 07-1999 i Contents Page Committees responsible Inside front cover National foreword ii 1 Scope 1 1.1 Implementation objectiv
7、es 1 1.2 Inclusions 1 1.3 Exclusions 1 2 Definitions 1 3 Formats 2 3.1 Sets of values 2 3.2 Basic formats 3 3.3 Extended formats 4 3.4 Combinations of formats 4 4 Rounding 5 4.1 Round to nearest 5 4.2 Directed roundings 5 4.3 Rounding precision 5 5 Operations 5 5.1 Arithmetic 5 5.2 Square root 6 5.3
8、 Floating-point conversions 6 5.4 Conversions between floating point and integer 6 5.5 Round floating-point number to integral value 6 5.6 Binary 7 decimal conversion 6 5.7 Comparison 7 6 Infinity, NaNs and signed zero 8 6.1 Infinity arithmetic 8 6.2 Operations with NaNs 8 6.3 The sign bit 9 7 Excep
9、tions 9 7.1 Invalid operations 9 7.2 Division by zero 9 7.3 Overflow 10 7.4 Underflow 10 7.5 Inexact 11 8 Traps 11 8.1 Trap handler 11 8.2 Precedence 11 Appendix A Recommended functions and predicates 12 Figure 1 Single format 4 Figure 2 Double format 4 Table 1 Summary of format parameters 3 Table 2
10、 Decimal conversion ranges 7 Table 3 Correctly rounded decimal conversion range 7 Table 4 Predicates and relations 8BS7237:1990 ii BSI 07-1999 National foreword This British Standard, prepared under the direction of the Information Systems Technology Standards Policy Committee, is identical with IEC
11、 559:1989 “Binary floating-point arithmetic for microprocessor systems”, published by the International Electrotechnical Commission (IEC). A British Standard does not purport to include all the necessary provisions of a contract. Users of British Standards are responsible for their correct applicati
12、on. Compliance with a British Standard does not of itself confer immunity from legal obligations. Summary of pages This document comprises a front cover, an inside front cover, pages i and ii, pages1 to 12 and a back cover. This standard has been updated (see copyright date) and may have had amendme
13、nts incorporated. This will be indicated in the amendment table on theinside front cover.IEC559:1989 BSI 07-1999 1 1 Scope 1.1 Implementation objectives It is intended that an implementation of a floating-point system conforming to this standard can be realized entirely in software, entirely in hard
14、ware, or in any combination of software and hardware. It is the environment that the programmer or user of the system sees that conforms or fails to conform to this standard. Hardware components that require software support to conform shall not be said to conform apart from such software. 1.2 Inclu
15、sions This standard specifies: 1) basic and extended floating-point number formats; 2) add, subtract, multiply, divide, square root, remainder and compare operations; 3) conversions between integer and floating-point numbers; 4) conversions between different floating-point formats; 5) conversions be
16、tween basic format floating-point numbers and decimal strings, and 6) floating-point exceptions and their handling, including non-numbers (NaNs). 1.3 Exclusions This standard does not specify: 1) formats of decimal strings and integers; 2) interpretation of the signs and significant fields of NaNs,
17、or 3) binary 7 decimal conversions to and from extended formats. 2 Definitions biased exponent the sum of the exponent and a constant (bias) chosen to make the biased exponents range non-negative binary floating-point number a bit-string characterized by three components: a sign, a signed exponent,
18、and a significand. Its numerical value, if any, is the signed product of its significand and two raised to the power of its exponent. In this standard a bit-string is not always distinguished from a number it may represent denormalized number a nonzero floating-point number, the exponent of which ha
19、s a reserved value, usually the formats minimum, and the explicit or implicit leading significant bit of which is zero destination the location for the result of a binary or unary operation. The destination may be either explicitly designated by the user or implicitly supplied by the system (e.g. in
20、termediate results in sub-expressions or arguments for procedures). Some languages place the results of intermediate calculations in destinations beyond the users control. Nonetheless, this standard defines the result of an operation in terms of that destinations format as well as the operands value
21、s exponent the component of a binary floating-point number that normally signifies the integer power to which two is raised in determining the value of the represented number. Occasionally the exponent is called the signed or unbiased exponent fraction the field of the significand that lies to the r
22、ight of its implied binary point mode a variable that a user may set, sense, save and restore, to control the execution of subsequent arithmetic operations. The default mode is the mode that a program can assume to be in effect unless an explicitly contrary statement is included either in the progra
23、m or in its specificationIEC559:1989 2 BSI 07-1999 the following modes shall be implemented: 1) rounding, to control the direction of rounding errors, and in certain implementations. 2) rounding precision, to shorten the precision of results. The implementor may, at his option, implement the followi
24、ng modes: 3) traps disabled/enabled, to handle exceptions. NaN not a number; a symbolic entity encoded in floating-point format. There are two types of NaNs (see6.2). Signalling NaNs signal the invalid operation exception (see7.1) whenever they appear as operands. Quiet NaNs propagate through almost
25、 every arithmetic operation without signalling exceptions result the bit-string (usually representing a number) that is delivered to the destination significant the component of a binary floating-point number which consists of an explicit or implicit leading bit to the left of its implied binary poi
26、nt and a fraction field to the right shall the word “shall” signifies that which is obligatory in any conforming implementation should the word “should” signifies that which is strongly recommended as being in keeping with the intent of the standard, although architectural or other constraints beyon
27、d the scope of this standard may, on occasion, render the recommendations impractical status flag a variable that may take two states, set and clear. A user may clear a flag, copy it, or restore it to a previous state. When set, a status flag may contain additional system-dependent information, poss
28、ibly inaccessible to some users. The operations of this standard may, as a side-effect, set some of the following flags: inexact result, underflow, overflow, divide by zero and invalid operation user any person, hardware, or program not itself specified by this standard, having access to and control
29、ling those operations of the programming environment specified in this standard 3 Formats This standard defines four floating-point formats in two groups, basic and extended, each having two widths, single and double. The standard levels of implementation are distinguished by the combinations of for
30、mats supported. 3.1 Sets of values This sub-clause concerns only the numerical values representable within a format, not the encodings which are the subject of the following sub-clauses. The only values representable in a chosen format are those specified via the following three integer parameters:
31、Each formats parameters are displayed in Table 1. Within each format just the following entities shall be provided: Numbers of the form (1) s 2 E (b 0 b 1 b 2. b p1 ) where: s is 0 or 1; E is any integer between E minand E maxinclusive, and each b iis 0 or 1. P = number of significant bits (precisio
32、n) E max = maximum exponent, and E min = minimum exponentIEC559:1989 BSI 07-1999 3 Two infinities, +Z and Z; at least one signalling NaN, and at least one quiet NaN. Table 1 Summary of format parameters The foregoing description enumerates some values redundantly, for example: 2 0 (1.0) = 2 1 (0.1)
33、= 2 2 (0.01) = . However, the encodings of such nonzero values may be redundant only in extended formats (see3.3). The nonzero values of the form 2 E min (0.b 1 b 2. b p1 ) are called denormalized. Reserved exponents may be used to encode NaNs, Z, 0, and denormalized numbers. For any variable that h
34、as the value zero, the sign bit s provides an extra bit of information. Although all formats have distinct representations for + 0, and 0, the signs are significant in some circumstances, like division by zero, and not in others. In this standard 0 and Z are written without a sign when the sign does
35、 not matter. 3.2 Basic formats Numbers in the single and double formats are composed of three fields: a 1-bit sign s, a biased exponent e = E + bias, and a fraction f = b 1 b 2. b p1 . The range of the unbiased exponent E shall include every integer between two values E minand E maxinclusive, and al
36、so two other reserved values: E min 1 to encode 0 and denormalized numbers, and E max +1 to encode Z and NaNs. The foregoing parameters appear inTable 1. Each nonzero numerical value has just one encoding. The fields are interpreted as follows: 3.2.1 Single A 32-bit single format number X is divided
37、 as shown inFigure 1. The value of X is inferred from its constituent fields thus: 1) If e = 255 and f s 0, then is a NaN regardless of s 2) If e = 255 and f = 0, then = (1) s Z 3) If 0 e 255, then = (1) s2 e127(1.f) 4) If e = 0 and f s 0, then = (1) s2 126(0.f) (denormalized numbers) 5) If e = 0 an
38、d f = 0, then = (1) s0 (zero) Parameter Format Single Single Extended Double Double Extended P 24 U 32 53 U 64 E max + 127 U + 1 023 + 1 023 U+ 16 383 E min 126 k 1 022 1 022 k 16 382 Exponent bias + 127 Unspecified + 1 023 Unspecified Exponent width (bits) 8 U 11 11 U 15 Format width (bits) 32 U 43
39、 64 U 79IEC559:1989 4 BSI 07-1999 3.2.2 Double A 64-bit double format number X is divided as shown inFigure 2. The value of X is inferred from its constituent fields thus: 1) If e = 2 047 and f s 0, then is a NaN regardless of s 2) If e = 2 047 and f = 0, then = ( 1) s Z 3) If 0 e 2 047, then = (1)
40、s2 e1 023(1.f) 4) If e = 0 and f s 0, then = (1) s2 1 022(0.f) (denormalized numbers) 5) If e = 0 and f = 0, then = (1) s0 (zero) 3.3 Extended formats The single extended and double extended formats encode in an implementation-dependent way the sets of values in3.1 subject to the constraints ofTable
41、 1. This standard allows an implementation to encode some values redundantly, provided that redundancy is transparent to the user in the following sense: an implementation shall either encode every nonzero value uniquely or not distinguish redundant encodings of nonzero values. An implementation may
42、 also reserve some bit strings for purposes beyond the scope of this standard; when such a reserved bit string occurs as an operand the result is not specified by this standard. An implementation of this standard is not required to provide (and the user should not assume) that single extended format
43、s have greater range than double extended formats. 3.4 Combinations of formats All implementations conforming to this standard shall support the single format. Implementations should support the extended format corresponding to the widest basic format supported, and need not support any other extend
44、ed format 1) . Figure 1 Single format Figure 2 Double format 1) Only if upward compatibility and speed are important issues should a system supporting the double extended format also support the single extended format.IEC559:1989 BSI 07-1999 5 4 Rounding Rounding takes a number regarded as infinitel
45、y precise and, if necessary, modifies it to fit in the destinations format while signalling the inexact exception (see7.5). Except for binary 7 decimal conversion (the weaker conditions of which are specified in5.6), every operation specified in clause5 shall be performed as if it first produced an
46、intermediate result correct to infinite precision and with unbounded range, and then rounded that result according to one of the modes in this clause. The rounding modes affect all arithmetic operations except comparison and remainder. The rounding modes may affect the signs of zero sums (see6.3), a
47、nd do affect the threshold beyond which over-flow (see7.3) and underflow (see7.4) may be signalled. 4.1 Round to nearest An implementation of this standard shall provide round to nearest as the default rounding mode. In this mode, the representable value nearest to the infinitely precise result shal
48、l be delivered; if the two nearest representable values are equally near, the one with its least significant bit equal to zero shall be delivered. However, an infinitely precise result with magnitude at least 2 E max (2 2 P ) shall round to Z with no change in sign; here E maxand P are determined by
49、 the destination format (clause3) unless overridden by a rounding precision mode (see4.3). 4.2 Directed roundings An implementation shall also provide three user-selectable directed rounding modes: round toward +Z, round toward Z, and round toward 0. When rounding toward +Z, the result shall be the formats value (possibly +Z) closest to and no less than the infinitely precise result. When rounding toward Z, the result shall be the formats value (possiblyZ) closest to and no greater t