1、Asynchronous Design Using Commercial HDL Synthesis Tools,Michiel Ligthart Karl Fant Ross Smith Alexander Taubin Alex Kondratyev,Outline,Added Value of NCL - Simplification of design Canonical form of gates - The key for optimization NCL in CAD flow. An example Validation of optimization Experimental
2、 results Conclusion and future work,Outline,Added Value of NCL - Simplification of design Canonical form of gates - The key for optimization NCL in CAD flow. An example Validation of optimization Experimental results Conclusion and future work,Potential NCL Advantages,Outline,Added Value of NCL - Si
3、mplification of design Canonical form of gates - The key for optimization NCL in CAD flow. An example Validation of optimization Experimental results Conclusion and future work,NULL,Data Communication Based on DI Encoding,Completion detection,Combinational circuitry,Request for DATA/NULL,DI protocol
4、 with spacer (NULL) NULL propagation / NULL acknowledge Data propagation / Data acknowledge,Register,Register,DATA,NCL: Pushing Two-phase Behavior Down to the Level of Each Gate,NCL: Pushing Two-phase Behavior Down to the Level of Each Gate,Gate output acknowledges input changes Simplest DI encoding
5、 - dual-rail Sims58,General Implementation of Hysteresis Gates in CMOS,Refined Implementation of NCL Hysteresis Gates in CMOS,Reset of each individual gate scales up to the whole network,Family of Logic Gates,z=ab+ac+bc+z(a+b+c),The gate switchesto data when M inputs are datato NULL when all inputs
6、are NULLIt is possible to use “negative logic” reversing pull-up and pull-down networks,a,b,b,b,c,c,a,a,z,Example: 2-of-3 Threshold Gate with Hysteresis,c,Outline,Added Value of NCL - Simplification of design Canonical form of gates - The key for optimization NCL in CAD flow. An example Validation o
7、f optimization Experimental results Conclusion and future work,RTL Design Flow Combinational Optimization,Separate combinational logic and registers,Request for data/null,reset,Combi-national process,Request for data/null,Sequential process,NCL library,VHDL,Generic library,Synthesis,Synthesis,Step 1
8、. Translate HDL into “synchronous” netlist,Step 2.Convert intermediate netlist into NCL netlist,Two-Step Synthesis Flow(Using Synopsys Design Compiler),RTL description (MUX),entity testinput a,b,s : ncl_logic;output z : ncl_logic; architecture process (a, b, s) is beginif s = 1 then z = a;elsez = b;
9、end if; end process;,a,b,s,z,Input to Step 1: RTL Description (Multiplexer Example),MUX Example: Output of Step 1 / Input to Step 2: Intermediate Netlist,a,s,b,x,y,z,Two input NAND gates,Dual-rail Package,Define typetype dual_rail_logic is recordrail1 : std_logic ;rail0 : std_logic ; end record;,Opt
10、imizing with Design Compiler,Dual-rail expansion Two phases (set and reset) are separated Set phase ensures circuit functionality Reset phase is implied Optimizations are applied to the set phase,Dual-rail Expansion of MUX,a,s,b,x,y,z,Naive semi-static DIMS implementation 114 transistors (can be red
11、uced to 63 transistors by merging C-elements with OR-gates) versus 14 for a synchronous circuit,b.f,a.t,b.t,D-R,NAND,D-R,NAND,D-R,NAND,x.t,s.f,a.f,x.f,y.t,y.f,z.t,z.f,s.t,“Images”-Boolean Gates Implementing Set Functions,In the initial state: z=a=b=c=0,Image of Dual-rail NAND Gate,out.t,out.f,C,C,C,
12、C,a.t,b.t,a.f,b.f,D-R,NAND,a.t,a.f,b.t,b.f,out.t,out.f,Image of Dual-rail NAND Gate,out.t,out.f,a.t,b.t,a.f,b.f,Dual-rail Expansion for MUX,b.f,a.t,b.t,x.t,s.f,a.f,x.f,y.t,y.f,z.t,z.f,s.t,Twelve 2-input C-gates & Three 3-input OR-gates,Image Circuit of Dual-rail Expansion for MUX,b.f,a.t,b.t,x.t,s.f
13、,a.f,x.f,y.t,y.f,z.t,z.f,s.t,Optimized with Design Compiler,MUX circuit passes technology independent optimization and is mapped to “images” of gates from NCL library.,Technology Mapping with Design Compiler,NCL circuit: images are replaced by gates with hysteresis,44 transistors - 30% better than o
14、ptimized DIMS,Outline,Added Value of NCL - Simplification of design Canonical form of gates - The key for optimization NCL in CAD flow. An example Validation of optimization Experimental results Conclusion and future work,Optimization Flow,Boolean circuit,Dual-rail image,translation,Optimized circui
15、t,optimization,Mapped to images,tech.mapping,Synchronous,Validation of Optimization,The validity of transformations (DI equivalence) is based on two properties:Functional equivalence of optimized and original circuits (under two-phase operation)Maintenance of DI properties in optimized circuit,Both
16、are based on the properties of prime and irredundant networks and properties of algebraic factorization Brayton90, Hachtel92,Validation of Optimization: Idea of the Proof,Outline,Added Value of NCL - Simplification of design Canonical form of gates - The key for optimization NCL in CAD flow. An exam
17、ple Validation of optimization Experimental results Conclusion and future work,Manual vs. Synthesized Designs,Area (transistor number),For bigger circuits Synthesis/Manual ratio is better (22% improvement for biggest example),Synchronous vs. NCL design,gates,transistors,Penalty in transistors:,Dual-
18、rail implementation Effective delay-insensitivity,To reduce transistor count:,Use four-rail encoding Improve architectural solutions: e.g., OR instead MUX Compromise delay insensitivity,Outline,Added Value of NCL - Simplification of design Canonical form of gates - The key for optimization NCL in CA
19、D flow. An example Validation of optimization Experimental results Conclusion and future work,Conclusions,First methodology to use standard HDL and commercial tools both to simulate and synthesize asynchronous circuits The methodology is formally validated The results of the synthesis are acceptable
20、,Future Tasks,Reduce area/power without losing delay insensitivity (e.g., four-rail design) Relax DI requirements to reduce area (e.g., using timing assumptions)Use peephole optimizations (e.g., merge gates used for registration with their input gates etc.) Write DesignWare components to get better
21、performance for arithmetic units (infer hand designed components),Completion detection (request signal),Inverter (acknowledgement signal),Structural View on Sequential NCL,Propagation of DATA/NULL through orphans is not acked by output,Orphans are:,more local than fundamental mode assumption (concern particular paths),safer than isochronic forks (compare wire delays to cycle time),2NCL Delay-sensitivity: Orphans,Full adder,do not cross the completion boundaries,could be avoided by adding observability points,Orphans (continued),