§ 瀏覽學位論文書目資料
  
系統識別號 U0002-2006200512063600
DOI 10.6846/TKU.2005.00420
論文名稱(中文) 高效能及低功率快取記憶架構之設計
論文名稱(英文) Designs of High-Performance and Low-Power Cache Architectures
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 電機工程學系博士班
系所名稱(英文) Department of Electrical and Computer Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 93
學期 2
出版年 94
研究生(中文) 陳信全
研究生(英文) Hsin-Chuan Chen
學號 887350022
學位類別 博士
語言別 英文
第二語言別
口試日期 2005-06-03
論文頁數 85頁
口試委員 指導教授 - 江正雄(chiang@ee.tku.edu.tw)
委員 - 廖弘源(liao@iis.sinica.edu.tw)
委員 - 江正雄(chiang@ee.tku.edu.tw)
委員 - 施國琛(tshih@cs.tku.edu.tw)
委員 - 呂學坤(sklu@ee.fju.edu.tw)
委員 - 莊博任(pjchuang@ee.tku.edu.tw)
委員 - 郭大維(ktw@csie.ntu.edu.tw)
關鍵字(中) 平均能量損耗
平均存取時間
循序式MRU快取記憶體
有效位元預先決定
組路預測快取記憶體
可調整組路快取記憶體
可規劃組路快取記憶體
關鍵字(英) average energy dissipation
average access time
sequence MRU cache
valid-bit pre-decision
way-predicting cache
adjustable-way cache
configurable-way cache.
第三語言關鍵字
學科別分類
中文摘要
在電腦系統中,由於快取記憶體有著高命中率及低存取時間的特性,因此快取記憶體一直都是用來解決CPU與主記憶體間存取速度落差,並進而減少系統的平均存取時間之重要記憶元件。近年來,拜賜超大型積體電路技術不斷地進步,許多電腦系統已朝向發展嵌入式系統、晶片系統,或是多重處理器系統,然而對這些系統而言,低功率損耗的需求更是迫切地重要。由於CPU存取快取記憶體十分頻繁,而若能降低快取記憶體於存取時之能量消耗,將對整體電腦系統功率損耗的改善有著明顯的助益。因此,在現在電腦結構中,如何設計高效能和低功率的快取記憶體便成為一個重要的議題。在過去有許多有關於高效能或低功率快取記憶體的發展研究,於本論文中,我們將以常用之集結合映射快取記憶體架構為主,分別針對循序式及平行式兩種快取記憶體型態,提出幾種高效能及低功率快取記憶體設計。其中包括:利用有效位元預先決定方法,並配合序列的MRU組路紀錄,可以減少不必要的標記及資料記憶體的存取次數,進而改善傳統循序式MRU快取記憶體的平均存取時間與能量消耗。相似的方法亦可應用至平行式的組路預測快取記憶體,並配合單一MRU組路的預測,便可避免啟動不必要之標記及資料記憶陣列數量。如此對於高關聯度的快取記憶體而言,將進一步有效地改進傳統組路預測快取記憶體的效能與能量。此外,我們也提出一種可依執行程式行為區域性的不同,而彈性調整其關聯度之快取記憶體。此一可規劃組路的快取記憶體不僅可節省能量消耗,亦能維持與傳統集結合快取記憶體相同的存取時間;進而可應用於多重處理器系統中,以降低系統整體功率。
英文摘要
Due to the fact that cache memory has high hit rate and low access time, the cache memory has been an important memory device to reduce the speed gap between the processor and main memory in computer systems, and further reduce the average access time of the entire system. In recent years, thanks to the continuous progresses in VLSI technologies, many computer systems are trending to the development of the integrated systems such as embedded systems, system on chip, or multiprocessor systems. However, low power consumption is an essential requirement for these systems. If the energy dissipation during cache access can be reduced, there exists a significant improvement in the overall power consumption of computer systems due to that processors access the cache memory so frequently. Therefore, how to design high-performance and low-power caches is an important issue for the modern computer architectures. 
In the past, there were many researches devoted to the design of high-performance and low-power caches. In this dissertation, we focus on the often-used set-associative cache architectures, and propose several high-performance and low-power designs for serial and parallel cache types, respectively. For example, valid-bit pre-decision and MRU block list are used to eliminate the unnecessary access number of tag memory and data memory, and thus the improvement of the conventional sequential MRU cache in average access time and energy dissipation can be achieved. Based on the same idea, a new way-predicting cache using valid-bit pre-decision is proposed to progressively improve the energy dissipation and access time of the conventional way-predicting cache, especially for the cache with large associativity. Besides, we propose a set-associative cache that can provide the flexibility to configure its associativity according to different program behaviors; such that this configurable-way cache can save more energy while its performance is still maintained as same level as that of the conventional set-associative cache. Moreover, the proposed configurable-way cache can be used in the multiprocessor system to reduce the overall power consumption.
第三語言摘要
論文目次
TABLE OF CONTENTS

CHAPTER 1  INTRODUCTION                              1
1.1	Overview of Cache Memory Systems ……………………………………… 1
1.2	Trends in Cache Memories …………………………………………………… 3
1.3	Motivation ……………………………………………………………………… 4
1.4	Organization of Dissertation …………………………………………………… 5

CHAPTER 2  INVESTIGATION ON CACHE ARCHITECTURES  6
2.1	Introduction …………………………………………………………………… 6
2.2	Basic Cache Architectures …………………………………………………… 7
2.2.1	Direct-mapped Cache …………………………………………………… 7
2.2.2	Fully Associative Cache ……………………………………………… 8
2.2.3	Set-Associative Cache ………………………………………………… 9
2.3	Cache Performance/Energy Model ………………………………………… 11

CHAPTER 3  PREVIOUS RELATED WORKS                 14
3.1	Overview of Previous Works ………………………………………………… 14
3.2	MRU Caches ………………………………………………………………… 16
3.2.1	Sequential MRU Cache ………………………………………………… 17
3.2.2	Parallel MRU Cache …………………………………………………… 19
3.3	Selective-Way Cache ………………………………………………………… 22
3.4	Way-Predicting Cache ………………………………………………………… 24
3.4.1	Architecture and Operations ………………………………………… 24
3.4.2	Energy and Access Time Models ……………………………………… 24
 
CHAPTER 4  PROPOSED HIGH-PERFORMANCE AND LOW-POWER CACHE ARCHITECTURES       26
4.1	Overview …………………………………………………………… 26
4.2	Adjustable-Way Cache …………………………………………………………26
4.2.1	Architecture …………………………………………………………… 27
4.2.2	Design Strategy ……………………………………………………… 27
4.2.3	Support of Special Operating System ………………………………… 31
4.2.4	Operations ……………………………………………………………… 32
4.2.5	Overheads ……………………………………………………………… 32
4.2.6	Energy and Performance Evaluations ………………………………… 33
4.3	Configurable-Way Cache …………………………………………………… 35
4.3.1	Architecture …………………………………………………………… 35
4.3.2	Design Strategy ……………………………………………………… 36
4.3.3	Operations ……………………………………………………………… 38
4.3.4	Energy and Performance Evaluations ………………………………… 39
4.4	Improved Sequential MRU Cache …………………………………………… 40
4.4.1	Sub-block Placement ………………………………………………… 40
4.4.2	Architecture …………………………………………………………… 41
4.4.3	Valid-Bit Pre-Decision Search Algorithm …………………………… 42
4.4.4	Operations ……………………………………………………………… 44
4.4.5	Overheads ……………………………………………………………… 45
4.4.6	Energy and Performance Evaluations ………………………………… 46
4.5	Improved Way-Predicting Cache …………………………………………… 48
4.5.1	Architecture …………………………………………………………… 48
4.5.2	Operations ……………………………………………………………… 49
 
4.5.3	Overheads ……………………………………………………………… 50
4.5.4	Energy and Performance Evaluations ………………………………… 51
4.6	Applications ………………………………………………………………… 52

CHAPTER 5  SIMULATION AND ANALYSIS                  54
5.1	Simulation Platform ………………………………………………………… 54
5.2	Simulation Results for ADJW Cache ……………………………………… 55
5.2.1	Circuit Overhead Analysis …………………………………………… 55
5.2.2	Energy and Performance Analysis …………………………………… 55
5.2.3	Energy Improvement and Performance Degradation ………………… 57
5.3	Simulation Results for CNFG Cache ……………………………………… 59
5.3.1	Average Access Time and Energy Dissipation ……………………… 59
5.3.2	Energy Improvement and Performance Degradation ………………… 60
5.4	Simulation Results for SMRU-V Cache ……………………………………… 62
5.4.1	First Hit Rate vs. Sub-block Size and Associativity …………………… 62
5.4.2	Average Access Time ………………………………………………… 64
5.4.3	Average Energy Dissipation …………………………………………… 65
5.4.4	Improvement in Access Time and Energy …………………………… 66
5.5	Simulation Results for WPD-V Cache ……………………………………… 68
5.5.1	Various Rates for Way-predicting Cache ……………………………… 68
5.5.2	Average Energy Dissipation …………………………………………… 68
5.5.3	Average Access Time ………………………………………………… 71
5.5.4	Energy-Delay Product ………………………………………………… 71
5.5.5	Improvement in Access Time and Energy …………………………… 72

CHAPTER 6  CONCULSIONS AND FUTURE WORKS          75
6.1	Conclusions ………………………………………………………………… 75
6.2	Future Works ………………………………………………………………… 77

LIST OF FIGURES

Figure 1.1  Memory hierarchy of computer systems …………………………………2
Figure 2.1  Mapping example for direct-mapped cache ……………………………… 8
Figure 2.2  Mapping example for fully associative cache …………………………… 9
Figure 2.3  Mapping example for set-associative cache ……………………………… 10
Figure 2.4  Structure of a set-associative cache ….…………………………………11
Figure 2.5  Classic static RAM cell ………………………………………………… 12
Figure 3.1  Architecture of SMRU cache ……………………………………………18
Figure 3.2  Operation flow chart of SMRU cache …………………………………18
Figure 3.3  Architecture of PMRU cache …………………………………………… 20
Figure 3.4  Operation flow chart of PMRU cache …………………………………… 21
Figure 3.5  Architecture of selective-way cache …………………………………… 23
Figure 3.6  Architecture of way-predicting cache ………………………………… 23
Figure 4.1  Architecture of ADJW cache …………………………………………… 28
Figure 4.2  Input/output signals of control logic ……………………………………… 30
Figure 4.3  Circuit of control logic for enabled outputs at 4-way …………………… 31
Figure 4.4  Block arrangements for different associativities ………………………… 31
Figure 4.5  Architecture of CNFG cache …………………………………………… 36
Figure 4.6  Sub-block placement of one block in a set-associative cache …………… 41
Figure 4.7  Architecture of SMRU-V cache ………………………………………… 41
Figure 4.8  Valid-bit pre-decision search algorithm ………………………………… 43
Figure 4.9  Search approaches for SMRU cache and SMRU-V cache ……………… 44
Figure 4.10  Search decision circuit of SMRU-V cache ……………………………… 46
Figure 4.11  Architecture of WPD-V cache ……………………………………… 49
Figure 5.1  Improved rate in average energy dissipation for ADJW cache ………… 58
Figure 5.2  Degradation rate in average access time for ADJW cache ……………… 58
Figure 5.3  Improved rate in average energy for different adjusted ways …………… 61
Figure 5.4  Degradation rate in average access time for different adjusted ways …… 62
Figure 5.5  Improved rate in average energy for different block sizes 
and associativiities ……………………………………………………….62
Figure 5.6  First hit rate vs. sub-block size (Associativity = 32) …………………… 64
Figure 5.7  Average access time vs. sub-block size (Associativity = 32) …………… 65
Figure 5.8  Average energy vs. sub-block size (Associativity = 32) ………………… 66
Figure 5.9  Improved rate in access time for SMRU-V cache ……………………… 67
Figure 5.10  Improved rate in energy dissipation for SMRU-V cache ……………67
Figure 5.11  Average energy dissipation of WPD cache ……………………………… 70
Figure 5.12  Average energy dissipation of WPD-V cache ………………………… 70
Figure 5.13  Average access time of WPD cache …………………………………… 72
Figure 5.14  Average access time of WPD-V cache ………………………………… 72
Figure 5.15  ED product of WPD cache ……………………………………………… 73
Figure 5.16  ED product of WPD-V cache …………………………………………… 73
Figure 5.17  Improved rate in energy dissipation for WDP-V cache ……………… 74
Figure 5.18  Improved rate in access time for WDP-V cache ……………………… 74

LIST OF TABLES

Table 4.1  Function table of control logic for ADJW cache ………………………… 30
Table 4.2  Function table of control logic for CNFG cache ………………………… 38
Table 5.1  Access time and energy dissipation for ADJW cache and CSA cache …… 56
Table 5.2  Energy and performance impact of ADJW cache ……………………… 57
Table 5.3  Access time and energy dissipation for CNFG cache and CSA cache …… 61
Table 5.4  The number distribution of different hits for various benchmarks ……… 63
Table 5.5  Miss rate/ prediction-hit rate for different benchmarks …………………… 69
Table 5.6  Two valid-bit presence rates for different benchmarks …………………… 69
參考文獻
BIBLIOGRAPHY

[1]	M. M. Mano, Computer System Architecture, Prentice-Hall International Editions, 3rd Ed., Prentice Hall International, Inc., pp. 489-512, 1993.
[2]	A. J. Smith, “Cache Memories,” Computing Surveys, vol. 14, no. 4, pp. 473-530, Sept. 1982.
[3]	M. Hill, “A case for direct-mapped caches,” IEEE Computer, vol. 21, no. 12, pp. 25-40, Dec. 1988.
[4]	C. Zhang, X. Zhang, and Y. Yan, “Multi-column implementations for cache associativity,” Proc. 1997 IEEE International Conference on Computer Design: VLSI in Computers and Processors, pp. 504-509, Oct. 1997.
[5]	L. Liu, “Cache design with partial address matching,” Proc. 27th International Symposium on Microarchitecture, pp. 128-136, Nov./Dec. 1994.
[6]	K. Inoue, V. G. Moshnyaga, and K. Murakami, “Trends in high-performance, low-power cache memory architectures,” IEICE Transactions on Electron, vol. E85-C, no. 2, pp. 304-314, Feb. 2002.
[7]	M. Kondo, S. Tanaka, M. Fujita, and H. Nakamura, “Reducing memory system energy in data intensive computations by software-controlled on-chip memory,” Proc. Workshop on Compilers and Operating Systems for Low Power in PACT2002, pp. 1-10, Sept. 2002.
[8]	C. Zhang, F. Vahid, and W. Najjar, “Energy benefits of a configurable line size cache for embedded systems,” Proc. International Symposium on VLSI Design, pp. 136-146, Feb. 2003.
[9]	P. Ranganathan, S. Adve, and N. P. Jouppi, “Reconfigurable caches and their application to media processing,” Proc. 27th International Symposium on Computer Architecture (ISCA-27), pp.214-224, June 2000.
[10]	C. Zhang, F. Vahid, and W. Najjar, “A highly configurable cache architecture for embedded systems,” Proc. 30th Annual International Symposium on Computer Architecture, pp. 136-146, June 2003.
[11]	K. Inoue, T. Ishihara, and K. Murakami, “A high-performance and low-power cache architecture with speculative way-selection,” IEICE Transactions on Electron, vol. E83-C, no. 2, pp. 186-193, Feb. 2000.
[12]	A. Ma, M. Zhang and K. Asanović, “Way memorization to reduce fetch energy in instruction caches,” Proc. 28th ISCA Workshop on Complexity Effective Design, pp. 1-9, July 2001.
[13]	J. P. Hayes, Computer Architecture and Organization, The McGraw-Hill Companies, Inc., 2nd Ed., pp. 449-458, 1988.
[14]	J. L. Hennessy and D. A. Patterson, Computer Architecture A Quantitative Approach, Morgan Kaufman Publishers, Inc., 2nd Ed., pp. 412-413, 1997.
[15]	K. Hwang and F. A. Briggs, Computer Architecture and Parallel Processing, The McGraw-Hill Companies, Inc., pp. 98-128, 1984.
[16]	S. J. E. Wilton and N. P. Jouppi, “CACTI: An enhancement cache access and cycle time model,” IEEE Journal of Solid-State Circuits, vol. 31, no. 5, pp. 677-688, May 1996.
[17]	M. B. Kamble and K. Ghose, “Analytical energy dissipation models for low power caches,” Proc. 1997 International Symposium on Low Power Electronics and Design, pp. 143-148, Aug. 1997.
[18]	C. Su and A. Despain, “Cache design trade-offs for power and performance optimization: A case study,” Proc. International Symposium on Low Power Electronics and Design, pp. 63-68, April 1995.
[19]	H. S. Kim, N. Vijaykrishnan, M. Kandemir, and M. J. Irwin, “Multiple access caches: Energy implications,” Proc. 2000 IEEE Computer Society Workshop on VLSI, pp. 53-58, April 2000.
[20]	K. R. Agarwal and T. N. Vijaykumar, “Exploring high bandwidth pipelined cache architecture for scaled technology,” Proc. Design, Automation and Test in Europe Conference and Exhibition, pp. 778-783, March 2003.
[21]	A. Argawal, J. Hennessy, and M. Horowitz, “Cache performance of operating system and multiprogramming workloads,” ACM Transactions on Computer Systems, vol. 6, no. 4, pp. 393-431, Nov. 1988.
[22]	A. Agarwal and S. D. Pudar, “Column-associative caches: A technique for reducing the miss rate of direct-mapped caches,” Proc. 20th Annual International Symposium on Computer Architecture, pp. 179-190, May 1993.
[23]	B. Calder, D. Grunwald, and J. Emer, “Predictive sequential associative cache,” Proc. 2nd International Symposium on High Performance Computer Architecture, pp. 244-253, Feb. 1996.
[24]	C. Wu, Y. Hsu, and Y. Liu, “A quantitative evaluation of cache types,” Proc. 26th Hawaii International Conference on System Sciences, vol. 1, pp. 476-485, Jan.1993.
[25]	K. So and R. Rechtschaffen, “Cache operations by MRU change,” IEEE Transactions on Computers, vol. 37, pp. 700-709, 1988.
[26]	J. Kin, M. Gupta and W. H. Mangione-Smith, “The filter cache: An energy efficient memory structure,” Proc. 30th Annual International Symposium on Microarchitecture, pp. 184-193, Dec. 1997.
[27]	J. H. Chang, H. Chao, and K. So, “Cache design of a sub-micron CMOS system/370,” Proc. 14th Annual International Symposium on Computer Architecture, pp. 208-213, Jun. 1987.
[28]	E. Witchel and K. Asanović, “The span cache: Software controlled tag checks and cache line size,” Proc. 28th ISCA Workshop on Complexity Effective Design, pp. 1-12, June 2001.
[29]	D. H. Albonesi, “Selective cache ways: On-demand cache resource allocation,” Proc. 32nd Annual International Symposium on Microarchitecture, pp. 248 –259, Nov. 1999.
[30]	Y. Lee and B. K. Chung, “Pseudo 3-way set-associative cache: A way of reducing miss ratio with fast access time,” Proc. 1999 IEEE Canadian Conference on Electrical and Computer Engineer, pp. 391-396, May 1999.
[31]	C. Zhang, X. Zhang, and Y. Yan, “Two fast and high-associative cache schemes,” IEEE Micro, vol. 17, no. 5, pp. 40-49, Sept./Oct. 1997.
[32]	R. Kessler, R. Jose, A. Lebeck and M. Hill, “Inexpensive implementations of set-associativity,” Proc. 16th Annual International Symposium on Computer Architecture, pp. 131-139, May 1989.
[33]	H. C. Chen, J. S. Chiang, and Y. S. Lin, “A fast sequential MRU cache with competitive hardware cost,” Proc. 2nd International Conference on Parallel and Distributed Computing, Application and Technologies, pp. 220-227, July 2001.
[34]	Z. Zhu and X. Zhang, “Access-mode predictions for low-power cache design,” IEEE Micro, Vol. 2, No. 2, pp. 58-71, March/April 2002.
[35]	H. C. Chen and J. S. Chiang, “Design of an adjustable-way set-associative cache,” Proc. IEEE Pacific Rim Conference on Communication, Computers and Signal Processing, vol. 1, pp. 315-318, Aug. 2001.
[36]	J. Dean, J. E. Hicks, C. A. Waldspurger, W. E. Weihl, and G. Chrysos, “ProfileMe: hardware support for instruction-level profiling in out-of-order processors,” Proc. 30th International Symposium on Microarchitecture, pp. 292-302, Dec. 1997.
[37]	H. C. Chen and J. S. Chiang, “Design of a low-power configurable-way cache applied in multiprocessor systems,” IEICE Transactions on Information and Systems, vol. E86-D, no. 9, pp. 1542-1548, Sept. 2003.
[38]	M. Hill and A. J. Smith, “Experimental evaluation of on-chip microprocessor cache memories,” Proc. 11th Annual International Symposium on Computer Architecture, pp. 158-166, June 1984.
[39]	H. C. Chen and J. S. Chiang, “Low-power way-predicting cache using valid-bit pre-decision for parallel architectures,” Proc. IEEE 19th International Conference on Advanced Information Networking and Applications, vol. II-INA2005, pp. 203-206, March 2005.
[40]	K. Hwang, Advanced Computer Architecture-Parallelism, Scalability, Programmability, The McGraw-Hill Companies, Inc., 2nd Ed., pp.331-387, 1993.
[41]	M. Hill, DINERO III Cache Simulator: Code and Documentation, University of Wisconsin at Madison, 1998.
[42]	R. L. Sites and A. Agarwal, “Multiprocessor cache analysis using ATUM,” Proc. 15th Annual International Symposium on Computer Architecture, pp.186-195, June 1988.
[43]	A. Seznec, “DASC cache,” Proc. 1st IEEE Symposium on High-Performance Computer Architecture, pp. 134-143, Jan. 1995.
論文全文使用權限
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
同意授權
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信