
系統識別號 
U00021406200616041300 
中文論文名稱

基數八無進位式快速除法器架構之設計 
英文論文名稱

CarryFree Radix8 Division Architecture and Implementation of the Divider 
校院名稱 
淡江大學 
系所名稱(中) 
電機工程學系碩士在職專班 
系所名稱(英) 
Department of Electrical Engineering 
學年度 
94 
學期 
2 
出版年 
95 
研究生中文姓名 
桂志顥 
研究生英文姓名 
ChihHao Kuei 
學號 
789350153 
學位類別 
碩士 
語文別 
中文 
口試日期 
20060607 
論文頁數 
108頁 
口試委員 
指導教授余繁 指導教授江正雄 委員江正雄 委員余繁 委員李揚漢 委員鄭國興 委員呂學坤

中文關鍵字 
New SvobodaTung
基底八
倍率單元
除法器
帶號數字

英文關鍵字 
New SvobodaTung
Radix8
Prescale
Division
SignedDigit Number

學科別分類 
學科別＞應用科學＞電機及電子

中文摘要 
因為至今的高速運算以及多媒體應用盛行，在微處理器中的基本運算單元之設計就顯得相當重要。雖然除法器並不是最常被運用到的運算單元，但是它在算數運算單元中卻是扮演著相當重要的角色。位元遞迴的除法演算是一個簡易且常被運用的方法，為了加快運算的速度，我們可以增加基底數目來減少疊代運算的次數，若是採用基底β來運算，其中β=2m，則每次疊代運算可得到m位元的商數，所需要的疊代次數為n/m。我們提出了一個基底為八且相容於IEEE 7541985的浮點運算除法器，整個除法器的運算是根據New SvobodaTung（NST）演算法以及基底為八的多餘數字系統。
除法器使用了無進位延遲的加法單元來完成簡單的遞迴運算，在運算元部份，是透過倍率單元來完成。我們整理了一個有系統的方法來完成倍率單元的設計，也提出了除法架構的硬體設計，在運算時間的複雜度方面是一個常數，不會因為位元長度而改變。在我們所設計的除法器中，因為運用了無進位延遲加法器，所以可以快速的處理任何位元的除法。在模擬結果中可知道，硬體電路的複雜度以及運算速度上的效能都比傳統SRT除法器來的好。 
英文摘要 
Due to the progress of high speed computation and multimedia application, the hardware implementation of all basic arithmetic operations becomes important in the design of microprocessor. Although division is not the most frequent arithmetic operation, Implementation of division is always one key point in arithmetic units. A simple and widely implemented class of division algorithm is digit recurrence. To speedup the division process, one may reduce the number of iteration steps by increasing the radix β of the process. Selecting β=2m allows the generation of m quotient bits at each step and the number of steps can be reduced to n/m. We propose a radix8 floating point division that complies with the IEEE 7541985. This method is based on New SvobodaTung（NST） algorithm and the radix8 redundant number system.
The divider involves a simple recurrence with carryfree addition and employs prescaling of the operands. We summarize a general systematic method to accomplish the prescaling, and we also propose a hardware scheme such that the timing complexity is constant regardless of the bit length of the divisor. By taking the advantage of the carryfree addition, we can perform any bit length divider in a fast easy manner. The simulation results show that the hardware complexity and performance of this divider is competitive with conventional SRT dividers. 
論文目次 
中文摘要I
ABSTRACT II
目錄III
圖目VI
表目IX
第一章: 緒論
1.1 研究動機與目標1
1.2 論文架構3
第二章: 除法器架構、原理簡介
2.1 IEEE 754浮點標準5
2.2 浮點除法運算的組織架構9
2.3 帶號數字系統與帶號數字加法器簡介12
2.4 各式除法架構、原理簡介20
2.4.1 除法架構分類20
2.4.2 NewtonRaphson除法簡介20
2.4.3 Goldschmidt除法簡介25
2.4.4 Division by convergence除法簡介29
2.4.5 Restore除法簡介31
2.4.6 Nonrestore除法簡介35
2.4.7 SRT除法簡介38
2.4.8 SvobodaTung除法簡介47
第三章: 高基底除法
3.1 高基底除法簡介58
3.2 高基底SRT除法59
3.3 New SvobodaTung除法63
3.3.1 SvobodaTung除法簡介63
3.3.2 New SvobodaTung除法簡介67
3.3.3 New SvobodaTung除數範圍討論68
3.3.4 重置條件69
3.3.5 New SvobodaTung除法分析70
3.3.6 倍率單元（Prescale Unit）分析73
3.3.7 New SvobodaTung演算流程74
第四章: NST基底八除法架構
4.1 除法器架構76
4.2 倍率單元78
4.3 疊代單元89
第五章: 模擬與驗證結果
5.1 驗證流程94
5.2 模擬結果96
5.3 效能比較98
第六章: 結論
6.1 結論101
6.2 檢討與建議101
參考文獻103圖 目
圖2.1 IEEE 754 單精準度格式與倍精準度格式·········································6
圖2.2 +0.75（10）轉換成IEEE 754 單精準度格式········································8
圖2.3 0.75（10）轉換成IEEE 754 倍精準度格式·········································9
圖2.4 浮點除法運算組織架構··································································10
圖2.5 帶號數字加法器架構······································································13
圖2.6 數值置換架構··················································································16
圖2.7 牛頓羅富森·····················································································22
圖2.8 NewtonRaphson 除法架構······························································24
圖2.9 三階NewtonRaphson 除法架構····················································25
圖2.10 Goldschmidt 除法架構···································································27
圖2.11 Peter Soderquist 提出的除法架構··················································28
圖2.12 存回式除法（Restoring Division）··············································33
圖2.13 存回式除法架構············································································34
圖2.14 非存回式除法（Nonrestoring Division）·····································37
圖2.15 非存回式除法架構········································································37
圖2.16 加入商數0 的非存回式除法························································40
圖2.17 SRT 除法························································································41
圖2.18 SRT 基本架構·················································································42
圖2.19 Radix4 SRT 架構···········································································43
圖2.20 Radix4 SRT 的PD 圖··································································44
圖2.21 部份餘數的範圍表········································································44
圖2.22 部份餘數的重疊區域數值····························································44
圖2.23 Pentium 晶片的浮點運算錯誤散佈圖···········································46
圖2.24 Pentium 晶片的浮點運算錯誤曲面圖···········································46
圖2.25 Radix4 NST MROR 的基本架構··················································50
圖2.26 Radix4 Prescale 架構·····································································53
圖2.27 Radix4 MROR 疊代單元架構·······················································56
圖3.1 部份餘數範圍圖··············································································60
圖3.2 SRT 除法選商圖···············································································61
圖3.3 SRT 除法的PD 圖··········································································63
圖3.4 NST 除法理論樹狀圖······································································71
圖4.1 Radix8 MROR NST 除法組織架構················································77
圖4.2 倍率常數範圍相關圖······································································83
圖4.3 倍率選取單元的架構圖··································································84
圖4.4 疊代單元架構··················································································89
圖4.5 rj
1a與rj
2a重置邏輯電路········································································91
圖5.1 ModelSim 模擬結果1 ······································································95
圖5.2 ModelSim 模擬結果2 ······································································96
圖5.3 邏輯閘層次模擬結果1 ···································································97
圖5.4 邏輯閘層次模擬結果2 ···································································98
表 目
表2.1 IEEE 754 標準浮點格式····································································6
表2.2 數值置換真值表··············································································16
表2.3 最後和產生電路真值表··································································17
表2.4 最後和產生電路真值表··································································18
表2.5 存回式除法範例··············································································34
表2.6 非存回式除法架構··········································································38
表2.7 Svoboda 演算法················································································48
表2.8 Svoboda 除法範例············································································48
表2.9 Radix4 倍率常數表········································································52
表2.10 輸入數值與多工器輸出的真值表················································53
表2.11 Radix4 MROR 變負單元的真值表···············································54
表3.1 各種基底查表電路所需要的儲存容量··········································62
表3.2 NST 除法理論δ值列表···································································72
表4.1 Radix8 MROR 帶號數字二進制編碼表·········································77
表4.2 Radix8 MROR di 數值表································································81
表4.3 Radix8 MROR 倍率常數表····························································82
表4.4 倍率選取輸入與多工器的關係表··················································85
X
表4.5 變負單元真值表··············································································86
表4.6 rj
1a重置條件························································································90
表4.7 rj
2a重置條件························································································91
表4.8 補償單元選擇表··············································································92
表5.1 測試參數··························································································97
表5.2 單精確度的速度性能比較表··························································99
表5.3 倍精確度的速度性能比較表··························································99
表5.4 Gate Count 比較表··········································································100
表5.5 預估消耗功率比較表····································································100

參考文獻 
參考文獻
[1] S.F. Oberman and M.J. Flynn, “Division Algorithms and Implementations,” IEEE Trans. on Computers, vol. 46, no. 8, Aug. 1997, pp. 833854.
[2] S.F. Oberman and M.J. Flynn, “Minimizing the Complexity of SRT Tables,” IEEE Trans. on VLSI, vol. 6, no. 1, Mar. 1998, pp. 141149.
[3] J.E. Robertson, “A New Class of Digital Division Methods,” IRE Trans. on Electronic Computers, vol. 7, Sep. 1958, pp. 218222.
[4] T.D. Tocher, “Techniques of Multiplication and Division for Automatic Binary Computers,” Quarterly J. Mech. App. Math., vol. 2, pt. 3, 1958, pp. 364384.
[5] A. Svoboda, “An Algorithm for Division,” Information Processing Machines, vol. 9, Mar. 1963, pp. 183190.
[6] C. Tung, “A Division Algorithm for SignedDigit Arithmetic,” IEEE Trans. on Computers, vol. 17, Sep. 1968, pp. 887889.
[7] D.E. Atkins, “HigherRadix Division Using Estimates of The Divisor and Partial Remainders,” IEEE Trans. on Computers, vol. 17, no. 10, Oct. 1968, pp. 933937.
[8] M.D. Ercegovac, T. Lang, and P. Montuschi, “Very High Radix Division with Selection by Rounding and Prescaling,” Proc. 11th IEEE Symp. Computer Arithmetic, July 1993, pp. 112119.
[9] A. Avizienis, “Signed Digit Number Representation for Fast Parallel Arithmetic,” IRE Trans. on Electronic Computers, vol. 10, Sep. 1961, pp. 389400.
[10] S.F. Oberman and M.J. Flynn, “Design Issues in Division and Other FloatingPoint Operations,” IEEE Trans. on Computers, vol. 46, no. 2, Feb. 1997, pp.154161.
[11] L.A. Montalvo, K.K. Parhi, and A. Guyot, “New SvobodaTung Division,” IEEE Trans. on Computers, vol. 47, no. 9, Sep. 1998, pp. 10141020.
[12] H.R. Srinivas and K.K. Parhi, “A Fast Radix4 Division Algorithm and Its Architecture,” IEEE Trans. on Computers, vol. 44, no. 6, June 1995, pp. 826831.
[13] H.R. Srinivas, K.K. Parhi, and L.A, Montalvo, “Radix 2 Division with OverRedundant Quotient Selection,” IEEE Trans. on Computers, vol. 46, no. 1, Jan. 1997, pp. 8592.
[14] N. Burgess, “Prescaled MaximallyRedundant Radix4 SRT Divider,” Electronics Letters, vol. 30, no. 23, Nov. 1994, pp. 19261928.
[15] M.D. Ercegovac and T. Lang, “Simple Radix4 Division with Operands Scaling,” IEEE Trans. on Computers, vol. 39, no. 9, Sep. 1990, pp. 12041208.
[16] M.D. Ercegovac, T. Lang, and P. Montuschi, “VeryHigh Radix Division with Prescaling and Selection by Rounding,” IEEE Trans. on Computers, vol. 43, no. 8, Aug. 1994, pp. 909918.
[17] “IEEE Standard for Binary Floating Point Arithmetic,” ANSI/IEEE Standard 7541985, IEEE Computer Society, 1987.
[18] P. Montuschi and L. Ciminiera, “OverRedundant Digit Sets and the Design of DigitbyDigit Units,” IEEE Trans. on Computers, vol. 43, no. 3, Mar. 1994, pp. 269279.
[19] P. Montuschi and L. Ciminiera, “Design of a Radix4 Division Unit with Simple Selection Table,” IEEE Trans. on Computers, vol. 41, no. 12, Dec. 1992, pp. 16061611.
[20] S. Kuninobu, H.E.T. Nishiyama, T. Tanaguchi, and N. Takagi, “Design of High Speed MOS Multiplier and Divider Using Redundant Binary Representation,” Proc. 8th Symp. Computer Arithmetic, Como, Italy, 1987, pp. 8086.
[21] M.D. Ercegovax and T. Lang, Division and Square Root. Norwell, Mass.: Kluwer Academic, 1994.
[22] N. Burgess, “A Fast Division Algorithm for VLSI,” in Proc. IEEE Int’l Conf. Computer Design: VLSI in Computers and Processors, Boston, Oct. 1991, pp. 560563.
[23] J.S. Chiang and M.S. Tsai, “A Radix4 New SvobotaTung Divider with Constant Timing Complexity for Prescaling,” Kluwer Academic, vol. 33, no. 12, Jan. 2003, pp.117124.
[24] M.C. Mekhallalati and M.K. Ibrahim, “New High Radix MaximallyRedundant Signed Digit Adder,” IEEE International Symp. on Circuit and System, vol. 1, June 1999, pp. 459462.
[25] A. Nannarelli and T. Lang, “LowPower Division: Comparison among Implementation of Radix 4, 8 and 16,” 14th IEEE Symp. on Computer Arithmetic, 1999, pp. 60.
[26] A. Nannarelli and T. Lang, “LowPower Radix8 Divider,” Proc. of International Conference on Computer Design, Oct. 1998, pp. 420426.
[27] M.S. Tsai, “The Design and Implementation of a High Speed Radix4 Carry Free Division Architecture,” Master Thesis, Tamkang University, June, 2000.
[28] S.C. Tsai, “Design of a Fast Radix8 Divider with Operands Scaling,” Master Thesis, National Chung Hsing University, June, 1998.
[29] J.R. Huag, “High Performance Radix8 Divider,” Master Thesis, National Tsing Hua University, May, 1997.
[30] C.H. Liu, “A Comparative Study of Short WordLength FixedPoint, FloatingPoint, and LNS Arithmetic Units,” Master Thesis, Feng Chia University, June, 2005.
[31] M. Flynn, “On Division by Functional Iteration,” IEEE Transactions on Computers, Aug. 1970, pp. 702706.
[32] D.L. Flower and J.E. Smith, “An Accurate High Speed Implementation of Division by Reciprocal Approximation,” Proc. 9th Symp. on Computer Arithmetic, 1989, pp. 6067.
[33] Y.L. Chen, “Design of a Fast SignedDigit Divider for FloatingPoint Arithmetic,” Master Thesis, Feng Chia University, June, 2000.
[34] C.L. Wey and C.P. Wang, “Design of A Fast Radix4 SRT Divider and Its VLSI Implementation,” IEEE Proc. Comput. Digit. Tech., vol. 146, no. 4, July 1999, pp. 205210.
[35] T. Coe, T. Mathisen, C. Moler and V. Pratt, “Computational Aspects of the Pentium Affair,” IEEE Computational Science and Engineering, Mar. 1995, pp. 1831.
[36] T.H. Pan, H.S. Kay, Y. Chun and C.L. Wey, “HighRadix SRT Division with Speculation of Quotient Digits,” IEEE International Conference on Computer Design (ICCD'95)., Oct. 1995, pp. 479484.
[37] M.A. Thornton, “Signed Binary Addition Circuitry with Inherent Even Parity Outputs,” IEEE Trans. on Computers, vol. 46, no. 7, July 1997, pp. 811816.
[38] B. Parhami, “Generalized SignedDigit Number Systems: A Unifying Framework for Redundant Number Representations,” IEEE Trans. on Computers, vol. 39, no. 1, Jan. 1990, pp. 8998.
[39] T. Carter and J. Robertson, “Radix16 SignedDigit Division,” IEEE Trans. on Computers, vol. 39, no. 12, Dec. 1990, pp. 14241433.
[40] S.F. Anderson, J.G. Earle, R.E. Goldschmidt, and D.M. Powers, “The IBM System/360 Model 91:FloatingPoint Execution Unit,” IBM J. Research and Development, vol. 11, Jan. 1967, pp. 3452.
[41] R.E Goldschmidt, “Applications of Division by Convergence,” Master Thesis, Dept. of Electrical Eng., Massachusetts Inst. Of Technology, June 1964.
[42] P. Soderquist and M. Leeser, “Area and Performance Tradeoffs in FloatingPoint Divide and SquareRoot Implementation,” ACM Computing Surveys, vol. 28, no. 3, 1996, pp. 518564.
[43] T.J. Kwon, J.S. Moon, Jeff Sondeen and Jeff Draper, “A 0.18mm Implementation of a FloatingPoint Unit for a ProcessingInMemory System,” IEEE International Symp. on Circuit and System, vol. 2, May 2004, pp. 453456.
[44] X. Wang and B.E. Nelson, “Tradeoffs of Designing FloatingPoint Division and Square Root on Virtex FPGAs,” Proc. 11th Annual IEEE Symposium on FieldProgrammable Custom Computing Machines, vol. 2, Apr. 2003, pp. 195203.
[45] P. Montuschi and T. Lang, “Boosting VeryHigh Radix Division with Prescaling and Selection by Rounding,” IEEE Trans. on Computers, vol. 50, no. 1, Jan. 2001, pp. 1327.
[46] J.S. Moon, T.J. Kwon, J. Sondeen and J. Draper, “An AreaEfficient StandardCell FloatingPoint Unit Design for a ProcessingInMemory System,” Proc. of the Conference on European SolidState Circuits, Sep. 2003, pp. 5760.
[47] E. Rice and R. Hughey, “A New Iterative Structure for Hardware Division: The Parallel Paths Algorithm,” 16th IEEE Symposium on Computer Arithmetic (ARITH’03), June 2003, pp. 5462.
[48] A.A. Liddicoat and M.J. Flynn, “HighPerformance Floating Point Divide,” Euromicro Symposium on Digital Systems Design (DSD'01), Sept. 2001, pp. 354361.
[49] E. Antelo, T. Lang, P. Montuschi and A. Nannarelli, “DigitRecurrence Dividers with Reduced Logical Depth,” IEEE Transactions on Computers, vol. 54, no. 7, July 2005, pp. 837851. 
論文使用權限 
同意紙本無償授權給館內讀者為學術之目的重製使用，於20080621公開。不同意授權瀏覽/列印電子全文服務。 


