Automating Transformations fromFloating Point to Fixed Point.ppt

资源描述

1、Automating Transformations from Floating Point to Fixed Point for Implementing Digital Signal Processing Algorithms,Prof. Brian L. Evans Embedded Signal Processing Laboratory Dept. of Electrical and Computer Engineering The University of Texas at Austin,July 4, 2006,Based on work by PhD student Kyun

2、gtae Han (now at Intel Research Labs),2,Outline,Introduction Background Optimize fixed-point wordlengths Reduce power consumption in arithmetic Automate transformations of systems Conclusion,3,Implementing Digital Signal Processing Algorithms,Introduction,Code Conversion,Wordlength Optimization,Floa

3、ting-Point Program,Fixed Point (Uniform Wordlength),Fixed Point (Optimized Wordlength),Floating- Point Processor,Fixed- Point Processor,Fixed- Point ASIC,$,$,$,Price,Power*,Hardware,Digital Signal Processing Algorithms,* Power consumption,H,L,H,L,ASIC: Application Specific Integrated Circuit,4,Trans

4、formations to Fixed Point,Advantages Lower hardware complexity Lower power consumption Faster speed in processing Disadvantages Introduces distortion due to quantization error Search for optimum wordlengths by trial & error is time-consuming Research goals Automate transformations to fixed point Con

5、trol distortion vs. complexity tradeoffs,Code Conversion,Wordlength Optimization,Floating-Point Program,Fixed Point (Optimized Wordlength),Transformation,Introduction,5,Outline,Introduction Background Optimize fixed-point wordlengths Reduce power consumption in arithmetic Automate transformations of

6、 systems Conclusion,6,Fixed-Point Data Format,Integer wordlength (IWL) Number of bits assigned to integer representation Includes sign bit Fractional wordlength (FWL) Number of bits assigned to fraction Wordlength: WL = IWL + FWL,SystemC format www.systemc.org, = 3.14159(10) Floating Point3.140625(1

7、0) = 011.001001(2) WL=9; IWL=3; FWL=6 3.141479492(10) = 011.00100100001110(2) WL=16; IWL=3; FWL=13,Background,7,Feasible region,Distortion vs. Complexity Tradeoffs,Different wordlengths have different application distortion and implementation complexity tradeoffs,Background,Minimize implementation c

8、ostMinimize application distortion,Implementation complexity c(w),Application distortion d(w),Optimal tradeoff curve,Vector of wordlengths:,8,Wordlength Optimization,Background,Multiple objective optimization,Single objective optimization,Proposed work fixes integer wordlengths and searches for frac

9、tional wordlengths,9,Genetic Algorithm,Evolutionary algorithm Inspired by Holland 1975 Mimic processes of plant and animal evolution Find optimum of a complex function,Greg Rohling, Ph.D Defense, Georgia Tech, 2004,Background,10,Pareto Optimality,Pareto optimality: “best that could be achieved witho

10、ut disadvantaging at least one group” Schick, 1970 Pareto optimal set is set of nondominated solutions E is dominated by C as all objectives for C are less than corresponding objectives for E Solutions A, B, C, D are nondominated (not dominated by any solution) Pareto front is boundary (tradeoff cur

11、ve) that connects Pareto optimal set solutions,Objective 2,Objective 1,Pareto Front,F,E,G,H,I,D,C,B,A,Background,11,Outline,Introduction Background Optimize fixed-point wordlengths Reduce power consumption in arithmetic Automate transformations of systems Conclusion,12,Search for Optimum Wordlength,

12、Exhaustive search impractical for many variables Gradient-based search (single objective) Utilizes gradient information to determine next candidates Complexity measure (CM) Sung & Kum, 1995 Distortion measure (DM) Han et al., 2001 Complexity-and-distortion measure (CDM) Han & Evans, 2004 Guided rand

13、om search Genetic algorithm for single objective Leban & Tasic, 2000 Multiple objective genetic algorithm Han, Olson & Evans, 2006,Optimize Fixed-Point Wordlengths,Next,Next,13,Complexity-and-Distortion Measure,Weighted combination of measuresSingle objective function Gradient-based search Initializ

14、ation Iterative greedy search based on complexity and distortion gradient information,Optimize Fixed-Point Wordlengths,14,Case Study I: Filter Design,Infinite impulse response (IIR) filter Complexity measure: Area model of field-programmable gate array (FPGA) Constantinides, Cheung & Luk 2003 Distor

15、tion measure: Root mean square (RMS) error Seven fixed-point variables (indicated by slashes),Optimize Fixed-Point Wordlengths,15,Case Study I: Gradient-Based Search,CDM could lead to lower complexity and lower number of simulations compared to DM and CM,* Maximum distortion measured by root mean sq

16、uare (RMS) error is 0.1 * 167 = 268,435,456 (8.5 years, if 1 second per 1 simulation),Optimize Fixed-Point Wordlengths,16,Case Study I: Genetic Algorithm,100th Generation,250th Generation,500th Generation,Search Pareto optimal set (nondominated) Handles multiple objectives: Error and Area,* Populati

17、on for one generation: 90,Pareto Front,LUT: Lookup table,9,000 simulations,22,500 simulations,45,000 simulations,Optimize Fixed-Point Wordlengths,17,Case Study I: Comparison,Gradient-based search (GS) results vs. GA results,GS methods can get stuck in a local minimumGS methods reduce running time (C

18、DM: 145 simulations),* Required RMSmax for gradient-based search are Dmax 0.12, 0.1, 0.08,500th Generation (45000 simulations),50th Generation (4500 simulations),Optimize Fixed-Point Wordlengths,18,Case Study II: Communication System,Simple binary phase shift keying (BPSK) system Complexity measure:

19、 Area model of field-programmable gate array (FPGA) Constantinides, Cheung, and Luk 2003 Distortion measure: Bit error rate (BER) Four fixed-point variables (indicated by slashes),Integration & Dump,Optimize Fixed-Point Wordlengths,Decision,AWGN,Source Data (1 or -1),Carrier,BER,19,Case Study II: Gr

20、adient-Based Search,CDM could lead to lower complexity and lower number of simulations compared to DM and CM,* Maximum distortion measured by bit error rate (BER) error is 0.1,Optimize Fixed-Point Wordlengths,20,Case Study II: Genetic Algorithm,Search Pareto optimal set Handles multiple objectives,5

21、0th Generation,100th Generation,200th Generation,* Population for one generation: 90,Pareto Front,LUT: Lookup table,4,500 simulations,9,000 simulations,18,000 simulations,Optimize Fixed-Point Wordlengths,Error (Bit Error Rate),Error (Bit Error Rate),Error (Bit Error Rate),For Comparison,Preliminary

22、results,21,Comparison of Proposed Methods,Optimize Fixed-Point Wordlengths,22,Outline,Introduction Background Optimize fixed-point wordlengths Reduce power consumption in arithmetic Automate transformations of systems Conclusion,23,Lower Power Consumption in DSP,Minimize power dissipation due to lim

23、ited battery power and cooling system Multipliers often a major source of dynamic power consumption in typical DSP applications Multi-precision multiplier select smaller multipliers (8, 16 or 24 bits) to reduce power consumption Wordlength reduction to select any word size Han, Evans & Swartzlander

24、2004 In general, what reductions in power are possible in software when hardware has fixed wordlengths?,Reduce Power Consumption in Arithmetic,Next,24,Wordlength Reduction in Multiplication,Input data wordlength reduction Smaller bits enough to represent, e.g. x 9 Truncation Signed right shift Move

25、toward the least significant bit (LSB) Signed bit extended for arithmetic right shift,Sign bit,Reduce Power Consumption in Arithmetic,25,Power consumption Switching power consumption Static power consumption Switching power consumption Switching activity parameter, Reduce by wordlength reduction,Rel

26、ationship between reduced wordlength and switching parameter in power consumption?,Power Reduction via Wordlength Reduction,Reduce Power Consumption in Arithmetic,26,Analytical Method,Wordlength (L) = 16,Reduction,No Reduction,Reduce Power Consumption in Arithmetic,27,Dynamic Power Consumption for W

27、allace Multiplier (1 MHz),Reduction (56%),16-bit x 16-bit multiplier (Simulated on Xilinx XC3S200-5FT256 FPGA),Truncation- First Truncation- Second,Truncate 1st arg Truncate 2nd arg (recode,nonrecode),Wallace multiplier used in TI 320C64 DSP,Reduce Power Consumption in Arithmetic,28,Dynamic Power Co

28、nsumption for Radix-4 Modified Booth Multiplier (1 MHz),Reduction (31%),Sensitive (13%),16-bit x 16-bit multiplier (Simulated on Xilinx XC3S200-5FT256 FPGA),Swapping could have benefit,Radix-4 modified Booth multiplier used in TI 320C62 DSP,Truncate 1st arg Truncate 2nd arg (recode,nonrecode),Reduce

29、 Power Consumption in Arithmetic,29,Comparison of Proposed Methods,Truncation to 8 bits reduces est. power consumption by 56% in Wallace and 31% in Booth 16-bit multipliers Signed right shift has no est. power reduction in Wallace multiplier (for any shift) and 25% reduction in Booth (for 8-bit shif

30、t) multiplier Operand swapping reduces power consumption for Booth but has negligible savings for Wallace multiplier Power consumption in tree-based multiplier Highly dependent on input data Simulation matches analysis,Reduce Power Consumption in Arithmetic,30,Outline,Introduction Background Optimiz

31、e fixed-point wordlengths Reduce power consumption in arithmetic Automate transformations of systems Conclusion,31,Automating Transformations from Floating Point to Fixed Point,Existing fixed-point tools Support fixed-point simulation Convert floating-point code to raw fixed-point code Manually find

32、 optimum wordlength by trial and error Automating transformations Fully automate conversion and wordlength optimization,Floating-Point Program,Wordlength-Optimized Fixed-Point Program,Code Conversion,Wordlength Optimization,Automatic Transformations of Systems,32,Automatic Transformation Flow,Code g

33、eneration Parse floating-point program Generate raw fixed-point program and auxiliary programs Range estimation Estimate range to avoid overflow (Analytical/Simulation) Determine integer wordlength (IWL) Wordlength optimization Optimize wordlength according to given input, and error specification (A

34、nalytical/Simulation) Determine fractional wordlength (FWL),Code Generation,Wordlength Optimization,Range Estimation,Automatic Transformations of Systems,33,Automating Transformation Environment for Wordlength Optimization,Top Program,Search Engine,Evaluation Program (Objectives),Fixed-Point Program

35、,Floating-Point Program,Error Estimation,Complexity Estimation,Range Estimation,Given floating-point program and options, auxiliary programs are automatically generatedGiven input data, optimum wordlength is searched,Input Data,Gradient-based or Genetic algorithm,Optimum Wordlength,Automatic Transfo

36、rmations of Systems,34,Demo of Released Software,Automatic Transformations of Systems,35,Conclusion,Search for optimum wordlength Gradient-based search reduces execution time while solutions could be trapped in local optimum Genetic algorithm can find distortion vs. complexity tradeoff curve, but it

37、 requires longer execution time Reduce power consumption by wordlength reduction of multiplicands Automate transformations from floating-point programs to fixed-point programs Freely distributable software release available at,Conclusion,http:/www.ece.utexas.edu/bevans/projects/wordlength/converter/

38、,36,Future Work,Advanced wordlength search algorithms Hybrid wordlength optimization Prune redundant wordlength variables (e.g. delay, adder) Adaptive step size for gradient-based search methods Further analysis on search algorithms Analysis of genetic algorithms with different settings Comparison w

39、ith simulated annealing Low power consumption System level including memory Powell and Chau, 1991 Wordlength reduction for floating-point multipliers,Conclusion,37,Future Work (continued),Electronic design automation software Enhanced code generator (e.g. rounding preferences) Hybrid analytical/simu

40、lation range estimation Optimum DSP algorithms Rearranging subsystems at block diagram Rearranging mathematical expressions in algorithm Developing more sophisticated hardware area models Avoids having to route each design through synthesis tools Transcendental functions,Conclusion,38,End,Thank you!

41、,39,Backup Slides,Backup Slides,40,Publications-I,Conference Papers K. Han, A. G. Olson, and B. L. Evans, Automatic floating-point to fixed-point transformations, Proc. IEEE Asilomar Conf. on Signals, Systems, and Computers, Nov. 2006, Pacific Grove, CA USA. invited paper. K. Han, B. L. Evans, and E

42、. E. Swartzlander, Jr., Low-Power Multipliers with Data Wordlength Reduction, Proc. IEEE Asilomar Conf. on Signals, Systems, and Computers, Oct. 30-Nov. 2, 2005, pp. 1615-1619, Pacific Grove, CA USA. K. Han, B. L. Evans, and E.E. Swartzlander, Jr., Data Wordlength Reduction for Low-Power Signal Proc

43、essing Software, Proc. IEEE Work. on Signal Processing Systems, Oct. 13-15, 2004, pp. 343-348, Austin, TX USA. K. Han and B. L. Evans, Wordlength Optimization with Complexity-And-Distortion Measure and Its Applications to Broadband Wireless Demodulator Design, Proc. IEEE Int. Conf. on Acoustics, Spe

44、ech, and Signal Proc., May 17-21, 2004, vol. 5, pp. 37-40, Montreal, Canada. K. Han, I. Eo, K. Kim, and H. Cho, Numerical Word-Length Optimization for CDMA Demodulator, Proc. IEEE Int. Sym. on Circuits and Systems, May, 2001, vol. 4, pp. 290-293, Sydney, Australia. K. Han, I. Eo, K. Kim, and H. Cho,

45、 Bit Constraint Parameter Decision Method for CDMA Digital Demodulator, Proc. CDMA Int. Conf. & Exhibition, Nov. 2000, vol. 2, pp. 583-586, Seoul, Korea. S. Nahm, K. Han, and W. Sung, A CORDIC-based Digital Quadrature Mixer: Comparison with ROM-based Architecture, Proc. IEEE Int. Sym. on Circuits an

46、d Systems, Jun. 1998, vol. 4, pp. 385-388, Monterey, CA USA.,Publications,41,Publications-II,Journal Articles K. Han and B. L. Evans, Optimum Wordlength Search Using A Complexity-And-Distortion Measure, EURASIP Journal on Applied Signal Processing, special issue on Design Methods for DSP Systems, vo

47、l. 2006, no. 5, pp. 103-116, 2006. Other Publications K. Han, E. Soo, H. Jugn, and K. Kim, Apparatus and Method for Short-Delay Multipath Searcher in Spread Spectrm Systems, U.S. Patent pending, Nov. 2001. K. Han, I. Lim, E. Soo, H. Seo, K. Kim, H. Jung, and H. Cho, Apparatus and Method for Separati

48、ng Carrier of Multicarrier Wireless Communication Receiver System, U.S. Patent pending, Sep. 2001. K. Han, Carrier Synchronization Scheme Using Input Signal Interpolation for Digital Receivers, Masters Thesis, Seoul National University, Seoul, Korea, Feb. 1998.,Publications,42,Research on Transforma

49、tion,Backup,43,Simulation Flow,Generate Optimized fixed-point program,Search wordlength set,Setup desired specification,Gradient-based search algorithm,Genetic search algorithm,Pick one of sets,Search wordlength sets,Generate Pareto Front,Backup,44,Algorithm Design and Implementation,Floating-Point Programs,Uniform Wordlength Fixed-Point Programs,

展开阅读全文