1、Algebraic Techniques To Enhance Common Sub-expression Extraction for Polynomial System Synthesis,Sivaram Gopalakrishnan Synopsys Inc., Hillsboro, OR 97124Priyank Kalla Department of Electrical and Computer Engineering, University of Utah, Salt Lake City, UT- 84112,Outline,Problem context: Polynomial
2、 datapath synthesis Our Focus: Integrating CSE and Algebraic methods Applications: DSP for audio, video, multimedia. Motivation Previous Work and Limitations Integrated Approach Square-free factorization Common Coefficient Extraction Common Cube Extraction Algebraic Division Results: Area Optimizati
3、on Conclusions & Future Work,The Synthesis Flow,Polynomial representation?,Quadratic filter design for polynomial signal processingy = a0 . x12 + a1 . x1 + b0 . x02 + b1 . x0 + c . x0 . x1,Motivation,P1 = x2 + 6xy + 9y2 P2 = 4xy2 + 12y3 P3 = 2zx2 + 6xyz P1 = x(x+ 6y) + 9y2 P2 = 4xy2 + 12y3 P3 = x(2z
4、x + 6yz)P1 = x(x+ 6y) + 9y2 P2 = y2(4x+ 12y) P3 = xz(2x + 6y),Direct Implementation 17 Mults & 4 Adds,Horner form 15 Mults & 4 Adds,Factorization + CSE 12 Mults & 4 Adds,Motivation,d1 = x + 3y P1 = d12 P2 = 4d1y2 P3 = 2xzd1 d1 is a good building blockHow to identify such building blocks across multi
5、ple polynomial datapaths?Need an methodology to expose many common expressions!,Our Approach 8 Mults & 1 Add,Conventional Methods,Extracting control-dataflow graphs (CDFGs) from RTL Scheduling Resource sharing Retiming Control synthesis Algebraic Transforms for arithmetic designs Factorization Hosan
6、gadi et al, ICCAD 04 Common Sub-expression Elimination Hosangadi et al, VLSI 05 Term-rewriting Arvind et al, IEEE. Micro 98 Tree-Height Reduction De Micheli 94 Lack of symbolic computer algebra manipulation,Conventional Methods,Kernel/Co-kernel Extraction (Factorization + CSE) Integrates CSE with cu
7、be/coefficient extraction Uses coefficients and variables to identify cubes (co-kernels)to obtain kernels Subsequently uses CSE for further optimizationP = 5x2 + 10y3 + 15pq; Uses 5, 10, 15, x, y, p, q for kernel/co-kernel extraction Does not perform algebraic division Cannot determine decomposition
8、 5(x2 + 2y3 + 3pq)P = x2 + 2xy + y2; - (x+y)2 Cannot determine the above decomposition,Symbolic algebra techniques,Polynomial models for complex computational blocksGuiding Synthesis engines using Grbners basis Peymandoust and De Micheli, TCAD 02 Given polynomial F and Library elements F = h1 I1 + +
9、 hn In Restricted to library elementsDatapath optimization using word-length informationGopalakrishnan et al, ICCAD 07 Restricted to fixed-size datapaths Cannot address systems of polynomials,Optimization techniques,Canonical Form representation ckYk ck : Coefficient in the range (0 ck bk) Yk : Fall
10、ing factorial F = 3x2y2 - 3x2y - 3xy2 + 3xy = 3x(x-1)y(y-1)f1 = 5x3y2 - 5x3y - 15x2y2 + 15x2y + 10xy2 - 10xy + 3z2 f2 = 3x2y2 - 3x2y - 3xy2 + 3xy + z + 1d1 = x(x-1)y(y-1) f1 = 5d1(x-2) + 3z2 f2 = 3d1 + z + 1,Optimization techniques,Square-free factorizationLet F be an integral domain Z A polynomial
11、u in Fx is square-free if there is no polynomial v in Fx with deg(v, x) 0, such that v2 | u.u1 = x2 + 3x + 2; u1 = (x+1)(x+2) is square-free u2 = x4 + 7x3 + 18x2 + 20x + 8;u2 = (x+1)(x+2)2 is not square-free!,Optimization techniques,Common Coefficient Extraction P = 8x + 16y + 24z; P1 = 2(4x + 8y +
12、12z); P2 = 4(2x + 4y + 6z); P3 = 8(x + 2y + 3z); best transformationUse GCD computation Get the coefficients (ais) Compute GCD of every pair (ai, aj) Retain GCDs atleast (ai, aj) Arrange GCDs in decreasing order, perform extraction Update GCD list and continue,Optimization techniques,Common Coeffici
13、ent Extraction (Example) P = 8x + 16y + 24z + 15a + 30b; Coefficients 8, 16, 24, 15, 30 GCD list 8, 8, 1, 2, 8, 1, 2, 1, 6, 15 Reduced GCD list 8, 15 - decreasing order 15, 8Extracting 15 results in P = 8x + 16y + 24z + 15(a + 2b);Similarly, extracting 8 results in P = 8(x + 2y + 3z) + 15(a + 2b);,O
14、ptimization techniques,Common Cube Extraction Similar to kernel/co-kernel extraction (for variables) P1 = x2y + xyz; P2 = ab2c3 + b2c2x; P3 = axz + x2z2b; kernel/co-kernel extraction results in P1 = xy(x + z); P2 = b2c2(ac + x); P3 = xz(a + xzb);,Optimization techniques,Polynomial long divisionGiven
15、 two polynomials a(x) and b(x), algebraic division determines q(x) and r(x) such thata(x) = b(x) q(x) + r(x)a(x) = x4 - 2x3 + 5; b(x) = x2 + 3x - 2;a(x) = b(x) (x2 5x + 17) 61x + 39 q(x) r(x),Optimization techniques,Common Sub-Expression EliminationIdentify isomorphic patterns in an arithmetic expre
16、ssion tree and merge them!k = x + y; m = x + y + z; n = xy + x + y;k = x + y; m = k + z; n = xy + k;,Integrated approach,Input: The polynomial system Porig (list of arrays) Perform Canonization, Square-free factorization Get best initial cost: Cinitial Perform Coefficient extraction: Pcce Perform cu
17、be extraction: Pcce_cube, get linear blocks Get the lists representing the system For every linear block, for each list perform algebraic division Pick the best cost,Illustration,Integrated approach (Example),P1 = 13x2 + 26xy + 13y2 + 7x - 7y + 11; P2 = 15x2 - 30xy + 15y2 + 11x + 11y + 9; PorigSquar
18、e-free factorization does not work! Initial cost: 16 M and 10 AAfter common coefficient extraction (Pcce) P1 = 13(x2 + 2xy + y2) + 7(x y) + 11; P2 = 15(x2 - 2xy + y2) + 11(x + y) + 9; Linear blocks: (x y), (x + y),Integrated approach (Example),After common cube extraction (Pcce_cube) P1 = 13(x(x + 2
19、y) + y2) + 7(x y) + 11; P2 = 15(x(x- 2y) + y2) + 11(x + y) + 9; Linear blocks: (x y), (x + y), (x + 2y), (x 2y)Perform algebraic division using the linear blocksPcce is the best cost implementation with (x+y) (x-y)d1 = x + y; d2 = x - y; P1 = 13d12 + 7d2 + 11; P2 = 15d22 + 11d1 + 9; Cost: 6 M and 6
20、A,Results,Average area improvement: 42%,Results,Average area improvement: 42%,Conclusions & Future Work,Polynomial decomposition approach for arithmetic datapaths Arithmetic datapaths modeled as polynomial systems Integrating CSE with algebraic manipulation Performing algebraic decomposition to enhance the power of CSEImpressive area savings But delay penalty!Future Work: Address the concerns in delay!Retarget the approach towards power savings?,Questions?,