1、MAPLD 2005 #167,Page 1,Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs NASA 2005 Military and Aerospace Programmable Logic Devices (MAPLD) International Conference John Porcello L-3 Communications, Inc. Cleared by DOD/OFOISR for Public Release under 05-S-209
2、4 on 24 August 2005,MAPLD 2005 #167,Page 2,Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs,OutlineBackgroundAutomation TechniquesDSP Algorithm DesignHDL Coding and SynthesisTiming & PlacementHardware-In-The-Loop (HITL) Test and VerificationCase Study: Direct
3、 Digital Synthesizer (DDS) using Xilinx Virtex-4 XtremeDSPSummary,MAPLD 2005 #167,Page 3,Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs,BackgroundField Programmable Gate Arrays (FPGAs) are the leading implementation path for Reprogrammable, High Performance
4、 Digital Signal Processing (DSP) Applications. The performance advantage of FPGAs over Programmable DSPs is a driving factor for implementing DSP designs in an FPGA.Using VHDL and Verilog Hardware Description Languages (HDL) is often a lengthy development path to implement a DSP design into an FPGA.
5、FPGA development tools are using HDL and non-HDL DSP Intellectual Property (IP) to reduce the design and implementation time. This concept and approach is successful at reducing the design and implementation cycle and increasing productivity in many applications.However, High Performance DSP impleme
6、ntations using dedicated HDL still provide the greatest flexibility for implementing High Performance DSP Algorithms WHY?,MAPLD 2005 #167,Page 4,Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs,Three (3) Reasons to use a dedicated HDL Implementation Path for
7、a High Performance DSP Application1) Control: Available IP cant achieve required performance and functionality. 2) Complexity: Increasing DSP Algorithm Complexity requires unique tailoring for the application. 3) Components: FPGA architectures are increasing the number of dedicated components other
8、than FPGA fabric (embedded multipliers, hard microprocessors, dedicated transceivers, application specific devices, etc). Low level control is required to maximize these components into a high performance design.,MAPLD 2005 #167,Page 5,Automation Techniques for Fast Implementation of High Performanc
9、e DSP Algorithms in FPGAs,Major Advantages and Disadvantages using the HDL Implementation Path for High Performance DSP ApplicationsLow Level Control and flexibility to achieve required or specific performance (+) Design, development and integration of various IP cores (+) Source level control of DS
10、P design (+) Considerable design and implementation path relative to non-HDL implementation path (-) Extensive Debug, Test and Verification Path (-)Can we reduce or eliminate any of these disadvantages to improve productivity?,MAPLD 2005 #167,Page 6,Automation Techniques for Fast Implementation of H
11、igh Performance DSP Algorithms in FPGAs,YESThe Objectives of Automation Techniques - Identify and apply methods useful for faster implementation of High Performance DSP Designs. Reduce Design and Implementation Time Perform Error Checking Develop greater insight into successful high performance DSP
12、Implementations by automating techniquesSpecific focus areas to achieve objectives: DSP Algorithm Design HDL Coding and Synthesis Timing & Placement Hardware-In-The-Loop (HITL) Test and VerificationIf one of these processes cannot meet required performance, it is often necessary to back up and apply
13、 techniques to collect data to study the problem.,MAPLD 2005 #167,Page 7,Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs,Automation Techniques - Not a new concept. No single direct formula for applying them. Automation Techniques are a function of DSP design
14、 and FPGA implementation processes. Automation Techniques are a means to improve and refine these processes. A look at the overall design through to implementation is required. Automation Techniques are then developed to improve processes. Consider the following processes and goals:Process GoalDSP A
15、lgorithm Design Produce a DSP Algorithm structured for an FPGA (function).HDL Coding and Synthesis Synthesizable DSP functions andperformance (implementation).Timing & Placement DSP timing and interface performance (speed).H/W-In-The-Loop (HITL) DSP numerical and interface performance Test and Verif
16、ication (accuracy, speed).Automation Techniques can be applied to improve these processes.,MAPLD 2005 #167,Page 8,Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs,Considerations for developing Automation Techniques1) Technical: Automation Technique(s) are oft
17、en required to go beyond the basics, and increase technical capabilities: A substantial amount of data will be generated, tested or analyzed to quantify performance. This includes the DSP design (truth vectors) and FPGA testing (DUT). Develop greater insight into DSP Design and FPGA Implementation.
18、Solve a specific problem. Current processes not effective. Improve DSP Design and FPGA Implementation processes in termsof efficiency and productivity.2) Cost: Development of Automation Techniques easily provide a cost benefit for processing large amounts of data. Other techniques may require substa
19、ntial Non-Recurring Engineering (NRE) to design, develop and implement. In these cases, Automation Techniques must provide substantial benefit to justify the NRE. Substantial effort to develop Automation Techniques for High Performance DSP Algorithms can often be applied when there is significant ne
20、ar-term benefit (current project) or long-term benefit (marketing new DSP algorithms with increased functionality and/or improved performance).,MAPLD 2005 #167,Page 9,Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs,DSP Algorithm Design - The DSP Algorithm ha
21、s the greatest impact on the implementation and performance.Best practice matches the DSP Algorithm to the FPGA Architecture. Knowledge of target hardware architecture is important to reduce a DSP Algorithm to equivalent high performance functions within an FPGA. The class of DSP Algorithm is signif
22、icant (wide variation): Filter, FFT, Multiply and Accumulate (MAC), Up/Down Converters Carrier Recovery, Timing and Synchronization Direct Digital Synthesizers (DDS), Waveform Generators Systolic Arrays, Matrix Methods, Statistical DSP Beam Forming, Image Processing Wideband, High Speed Spectral Pro
23、cessingFull parallel (unrolled, unfolded) implementations of iterative DSP Algorithms yield significant increase in performance at the expense of FPGA resources.,MAPLD 2005 #167,Page 10,Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs,DSP Algorithm Design - S
24、ystolic Array Design using the Xilinx Virtex-4 XtremeDSP TileSystolic Arrays are small, interconnected arrays of DSP Processing Elements (PEs). Very useful for many high performance DSP applications such as Digital Filters and Matrix Processing. Systolic arrays are typically full parallel structures
25、 processing one data sample per clock. Used in many VLSI designs, they can be 1-Dimensional or Multidimensional. Systolic array can be mapped from DSP equations consisting of iterative algorithms that can be “unrolled” (Filters, FFTs, etc.) . Latency is higher since data flow is through each element
26、. However, structures of this type may be implemented using FPGA fabric and/or dedicated FPGA components over high speed interconnects.,1D Systolic Array,Input,Output,Processing Element (PE),MAPLD 2005 #167,Page 11,Automation Techniques for Fast Implementation of High Performance DSP Algorithms in F
27、PGAs,DSP Algorithm Design - Systolic Array Design using the Xilinx Virtex-4 XtremeDSP Tile (cont.)FPGA Embedded Component: Xilinx Virtex-4 XtremeDSP Tile consists of two (2) DSP48 slices: Dedicated, pipelined MULT, Add/Subtract, ACC, MACC, Shift, Divide, Square Root, etc. High speed, dedicated inter
28、connects between DSP48 slices and to other XtremeDSP tiles Dynamically configurable functions (via OPMODE) Highest performance achieved w/out FPGA fabric,Ref. Xilinx XtremeDSP Design Considerations User Guide, Courtesy of Xilinx, Inc.,Processing Element (PE),1D Systolic Array,Input,Output,Processing
29、 Element (PE),MAPLD 2005 #167,Page 12,Processing Element (PE),MAPLD 2005 #167,Page 13,Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs,DSP Algorithm Design - Systolic Array Design using the Xilinx Virtex-4 XtremeDSP Tile (cont.) 1 Dimensional Systolic Array:
30、FIR filter with constant coefficients, relatively easy to manage design and implementation.,1D Systolic Array FIR Filter,Input,Output,Processing Element (PE),1D Systolic Array FIR Filter,with,Routing over dedicated, high speed interconnect,MAPLD 2005 #167,Page 14,Automation Techniques for Fast Imple
31、mentation of High Performance DSP Algorithms in FPGAs,DSP Algorithm Design - Systolic Array Design using the Xilinx Virtex-4 XtremeDSP Tile (cont.) 2 Dimensional Systolic Array: Increasing capabilities in DSP applications at the expense of increasing algorithm complexity.,2D Systolic Array FFT,Input
32、,Output,2D Systolic Array N-Point FFT,with,Routing over FPGA fabric,Reduce to Even and Odd PEs:,Apply DSP Algorithm Automation Techniques to manage complex DSP design, debugging, test and validation.,MAPLD 2005 #167,Page 15,Automation Techniques for Fast Implementation of High Performance DSP Algori
33、thms in FPGAs,DSP Algorithm Design Automation TechniquesDSP Design Validation, Quantifying Required Algorithm Performance and Limitations: Automating tools and simulations to perform extensive end-to-end test, data reduction and analysis, and algorithm validation. Automated techniques are useful in
34、DSP designs where algorithm confidence level over a broad performance range requires substantial baseline of test data. Techniques may utilize scripts or custom programs (MATLAB, C/C+, etc.) to verify algorithm numerical accuracy or maximum error, using simulated or actual test data. Methods used to
35、 validate a DSP algorithm are very important.Testing and Debugging DSP Modular Functions: Automating generation of “truth” data or vectors for test and analysis of synthesizable DSP functional building blocks. Algorithm Strength Reduction: Testing and evaluating alternate, equivalent DSP Algorithms
36、and mathematically equivalent functions (symmetry, periodicity, transform reduction, etc.). Functions that will have a higher performance and/or consume fewer FPGA resources.,MAPLD 2005 #167,Page 16,Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs,HDL Coding
37、and Synthesis HDL Coding style directly impacts FPGA Implementation. Good Coding techniques use HDL Coding Styles that support Scalable and Modular DSP designs (use of generics, VHDL generate, etc.). Important to tailor HDL coding to maximize Synthesis Tool.Full Parallel implementations often requir
38、e dividing up the DSP processing into small operations that can be performed during very short clock periods. This amounts to isolating functions or breaking up processing over several clock cycles at increased latency (and additional FPGA resources) to maintain throughput.Maximize DSP processing on
39、to high-speed interconnects for dedicated DSP components, such as the XtremeDSP tile, whenever possible.,MAPLD 2005 #167,Page 17,Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs,HDL Coding and Synthesis Automation Techniques Autocoding Functions: Autocoding r
40、outines can be used to automatically implement (or change) HDL code:Custom DSP Functions that must be divided up across several clock cycles to operate at maximum speed Clocking Techniques, Positive and Negative Edge HDL implementations Built-In-Test (BIT) Vector Generators / Vector Receivers: suppo
41、rt debug, test and verification up to the system level. Place multiple BIT blocks at full throughput. Useful for debugging, analysis and insight into successful High Performance DSP Designs. Can be combined with HITL testing for performance verification.HDL Converters: convert code (interpret code)
42、from another language to Synthesizable HDL. Effective converter tools may be implemented for porting algorithms to FPGA platforms.,MAPLD 2005 #167,Page 18,Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs,HDL Coding and Synthesis Automation Techniques Synthesi
43、s Profiling: Batch processing multiple Synthesis runs to obtain insight into the synthesis of a design:Establish desired variations in an HDL design for analysis. Generate multiple versions or incrementally modify HDL parameters in the design via C/C+, script or equivalent code. Batch process Synthe
44、sis Tool with synthesis constraints and obtain synthesis report. Batch processing via script or command line, refer to synthesis tool manual, such as the Xilinx Synthesis Technology (XST) User Guide for an XST design flow. Extract desired performance parameters from the Synthesis Report via C/C+, sc
45、ript or equivalent code.(continued next slide),MAPLD 2005 #167,Page 19,Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs,HDL Coding and Synthesis Automation Techniques Synthesis Profiling: (continued)Repeat process until sufficient information from multiple sy
46、nthesis runs are collected. Analyze the results of the multiple Synthesis runs. Profile the performance impact of parameters on the synthesis of the design. Useful for profiling effect of DSP Design and HDL coding parameters on Synthesis, performing design tradeoffs, best-match analysis between DSP
47、design and FPGA Implementation, and obtaining insight into successful High Performance DSP Designs.Combine with Timing and Placement Profiling for analyzing the entire FPGA implementation flow. FPGA Implementation Tools are usually well suited for command line processing of the entire implementation
48、 flow (example: Xilinx XFLOW).,MAPLD 2005 #167,Page 20,Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs,Timing and Placement Timing and Placement constraints direct the FPGA implementation tools and control the maximum speed and placement of the design. These
49、 constraints will directly impact many important performance criteria such as design margin, DSP throughput, pin placement, and data I/O. Effective methods exist such as the use of Relationally Placed Macros (RPMs) to create instances of specific DSP functions and direct their placement within the F
50、PGA. Timing Analysis reveals details of the speed of a given implementation and design margin against performance requirements. The Timing Analysis must be carefully interpreted to draw conclusions and identify where recoding and/or change to synthesis, timing and placement constraints is necessary. Timing Analysis also reveals which functions within the DSP algorithm are the issue and may not be achievable given fixed resources (FPGA type) and performance requirements. This indicates that a fundamental change in the DSP function or HDL coding is required.,
copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1