淡江大學覺生紀念圖書館 (TKU Library)
進階搜尋


  查詢圖書館館藏目錄
系統識別號 U0002-0808201117020800
中文論文名稱 基於雲端檔案系統之高效率檔案分塊與管理機制
英文論文名稱 An Efficient Cloud-based File Chunk and Management Scheme
校院名稱 淡江大學
系所名稱(中) 電機工程學系碩士在職專班
系所名稱(英) Department of Electrical Engineering
學年度 99
學期 2
出版年 100
研究生中文姓名 呂偉民
研究生英文姓名 Wei-Ming Lu
學號 797440301
學位類別 碩士
語文別 中文
口試日期 2011-07-14
論文頁數 66頁
口試委員 指導教授-吳庭育
委員-丁建文
委員-朱國志
委員-賴槿峰
委員-吳庭育
委員-李維聰
中文關鍵字 信息摘要演算法  索引名稱伺服器  分塊  壓縮 
英文關鍵字 MD5  INSs  Chunk  Compress 
學科別分類 學科別應用科學電機及電子
中文摘要 隨著網路頻寬跳躍式的成長與分散式檔案系統(Distributed File System)的應用成熟,雲端運算目前已成為網路的主流趨勢之一,肇因於網路使用者對於網路檔案儲存系統的需求日益增加,因此,雲端檔案系統(Cloud File System)架構已變成目前炙手可熱的話題。在雲端檔案系統中,網路使用者可以隨意從終端設備透過網路存取雲端檔案系統中的檔案。由於檔案儲存數量的增加也會讓上傳至雲端檔案系統伺服端時發生檔案重複的比率增加,導致網路為了傳輸這些重複的檔案佔用了網路頻寬;有鑑於這些檔案系統重複檔案上傳所產生的問題,因此,本論文提出適用於雲端檔案系統的檔案壓縮、分塊與比對機制,來改善檔案重複上傳的問題,減低網路頻寬使用率來達到網路的快速存取。
本論文是基於索引名稱伺服器(Index Name Servers, INSs)下的檔案壓縮、分塊及比對機制;這樣的機制包含壓縮(Compression)、分塊(Chunk)、檔案下載(Download)及修改上傳(Upload)的功能。本機制為了提升塊的穩定性,在完成塊切割後,經過切割的每個分塊都各自成為獨立個體,可在不修改其它塊的情況下各別更新。此外,檔案壓縮與分塊同步進行的方式,能有效的減少處理檔案所需之時間並提高執行效率。
經使用雲端檔案系統的分塊容量來壓縮及分塊處理,以其降低客戶端檔案上傳的重複率;處理過的檔案再經MD5指紋碼編碼後組成特徵碼,這些特徵碼可以提供索引名稱伺服器進行比對、建檔、分配儲存伺服器以及提供客戶端(Client)上傳所需的相關資訊。當使用者從檔案系統下載檔案進行編修後,再次上傳與儲存時,只需將被修改的塊重新進行壓縮、分塊再經由MD5編碼後重新再上傳,此舉,可減少不必要的重複資料上傳時間,也能減少了網路資源的浪費。
英文摘要 With the great advancement of the network bandwidth and the maturity of distributed file system, the cloud computing network has become one of the popular trends nowadays because users’ demands for network file storage systems keeps increasing. Therefore, the cloud file system is currently a hot topic. By terminal devices, network users can access the files stored in the cloud file system at will. However, the increase of the files will increase the duplication ratio while uploading files to the server of the cloud file system, which occupies the network bandwidth. To solve the problems due to duplicate files, this paper proposes the scheme of file compression, chunk and comparison for the cloud file system to remove duplicate files and decrease the bandwidth utilization to achieve fast access.

Our proposed scheme of file compression, chunk and comparison under Index Name Servers (INSs) has the capability of compression, chunk, download and upload. To enhance the stability of the chunks, each chunk is an individual after the incision and can be updated without modifying the other chunks. In addition, file compression and chunk are executed simultaneously to reduce the processing time and increase the execution efficiency.

In our proposed cloud file system, we compress and process the chunks in the optimal chunk size to reduce the duplicate files at the client side. The processed files are encoded into the signature by MD5 fingerprint for INSs to compare, file, designate to the storage server and provide necessary uploading information for the clients. After editing a file downloaded from the file system, uploading and saving the file again, all the user needs to do is to compress and divide the edited chunks, encode the chunks by MD5 fingerprint, and upload the signature again. In this way, not only unnecessary time for uploading duplicate files but also waste of network resources can be reduced.
論文目次 目錄
第一章 緒論 1
1.1 前言 1
1.2 研究動機與目的 2
1.3 章節提要 4
第二章 相關背景研究 5
2.1 雲端運算儲存現況 5
2.2 重複資料刪除技術(DE-DUPLICATION) 9
2.3 檔案分塊(FILE CHUNK) 12
2.3.1 固定長度分塊(Fixed-Sized Partitioning, FSP) 12
2.3.2 基本滑動視窗演算法(Basic Sliding Windows Algorithm, BSW) 13
2.3.3 小塊合併演算法(Small Chunk Merge Algorithm, SCM)14
2.3.4 消除大塊演算法(Breaking Large Chunks into Fixed Size Sub-Chunks Algorithm, BFS) 15
2.3.5 兩除數演算法(Two Divisors Algorithm, TD) 16
2.3.6 兩門檻兩除數演算法(Two Thresholds Two Divisors Algorithm, TTTD) 17
2.4 資料壓縮(DATA COMPRESSION) 18
2.4.1 行程編碼(Run-Length Coding, RLE) 19
2.4.2 字典編碼(Diction Coding) 19
2.4.2.1 第一類字典編碼 20
2.4.2.2 第二類字典編碼 24
2.4.3 霍夫曼編碼(Huffman Coding) 26
2.4.4 香農-範諾編碼(Shannon-Fano Coding) 31
2.5 資料塊數位指紋的計算(DIGITAL FINGERPRINT FALCULATED DATA CHUNK) 33
第三章 雲端儲存系統檔案分塊壓縮管理 36
3.1 索引名稱伺服器之檔案分塊架構 43
3.2 分塊(CHUNK) 45
3.3 壓縮(COMPRESSION) 47
3.4 分塊壓縮合併處理 50
3.5 客戶端分塊比較 50
第四章 模擬環境及模擬結果分析 53
4.1 使用者更新內文與所需更新分塊 53
4.2 分塊檔案壓縮率比較 60
4.3 壓縮分塊執行效率 63
第五章 結論與未來展望 64
參考文獻 65

圖目錄
圖1-1:修正更新流程 3
圖2-1:雲端運算架構示意圖 6
圖2-2:一對多分散式儲存示意圖 8
圖2-3:多對多分散式儲存示意圖 8
圖2-4:使用重複資料刪除提升儲存效能 10
圖2-5:固定長度分塊 12
圖2-6:滑動視窗演算法 14
圖2-7:小塊合併演算法 15
圖2-8:消除大塊演算法 16
圖2-9:兩除數演算法 17
圖2-10:兩門檻兩除數算法 18
圖2-11:LZ77運作流程 22
圖2-12:建立的第一個二元樹 30
圖2-13:建立完成的霍夫曼二元樹 30
圖2-14:建立完成的香農-範諾二元樹 32
圖2-15:MD5計算流程圖 34
圖2-16:MD5演算法區塊示意圖 35
圖3-1:全新檔案的上傳流程 38
圖3-2:檔案編修再儲存流程圖 40
圖3-3:原始檔與新檔案的產生 42
圖3-4:索引名稱伺服器整體架構 43
圖3-5:INSS運作流程圖 45
圖3-6:分塊的編號 45
圖3-7:客戶端的分塊比對方式 46
圖3-8:香農-範諾與霍夫曼壓縮比較 48
圖3-9:壓縮分塊後分塊內容 49
圖3-10:讀取內文檔案壓縮與分塊 50
圖3-11:分塊比對不一致 51
圖4-1:更新率10%與實際更新率 55
圖4-2:更新率20%與實際更新率 55
圖4-3:更新率30%與實際更新率 56
圖4-4:更新率40%與實際更新率 56
圖4-5:更新率50%與實際更新率 57
圖4-6:更新率60%與實際更新率 57
圖4-7:更新率70%與實際更新率 58
圖4-8:更新率80%與實際更新率 58
圖4-9:更新比率90%與實際更新率 59
圖4-10:文字檔未分塊原始檔案與分塊檔案壓縮率比較 61
圖4-11:MP3檔未分塊原始檔案與分塊檔案壓縮率比較 61
圖4-12:MPEG檔未分塊原始檔案與分塊檔案壓縮率比較 62
圖4-13:壓縮分塊執行效率 63

表目錄
表2-1:LZW電腦字典 26
表2-2:霍夫曼編碼表 27
表2-3:更改C編碼為101 28
表2-4:定長編碼 29
表2-5:香農-範諾編碼 32
表3-1:以字元A、B、C、D、E來定義重複次數的差異 48
表4-1:分塊大小之參數表 54
表4-2:使用者修改比率 54
表4-3:模擬檔案類型與大小 60



參考文獻 [1] Jianwei Yin, Yanming Ye, Bin Wu, Zuoning Chen, “Cloud computing oriented network operating systemand service platform,” Pervasive Computing and Communications Workshops (PERCOM Workshops), 2011, pp. 111-116.
[2] Shuai Zhang, Shufen Zhang, Xuebin Chen, Xiuzhen Huo, “Cloud Computing Research and Development Trend,“ Coll. of Sci., Hebei Polytech. Univ., Tangshan, China IEEE, Jan. 2010, pp. 22-24.
[3] Jiyi Wu, Lingdi Ping, Xiaoping Ge, Ya Wang, Jianqing Fu, ”Cloud Storage as the Infrastructure of Cloud Computing,” Intelligent Computing and Cognitive Informatics (ICICCI), 2010, pp. 380-383.
[4] Shufen Zhang, Shuai Zhang, Xuebin Chen, Shangzhuo Wu, ”Analysis and Research of Cloud Computing System Instance,” Future Networks,(ICFN), Oct. 2010, pp. 88-92.
[5] Ghosh, R., Longo, F., Naik, V.K., Trivedi, K.S, ”Quantifying Resiliency of IaaS Cloud,” Reliable Distributed Systems, 2010, pp. 343-347.
[6] Voith, T., Oberle, K., Stein, M., Oliveros, E., Gallizo, G., Kübert, R, ”A Path Supervision Framework A Key for Service Monitoring in Infrastructure as a Service (IaaS) Platforms,” Software Engineering and Advanced Applications (SEAA), 2010, pp. 127-130.
[7] Chengtong Lv, Qing Li, Zhou Lei, Junjie Peng, Wu Zhang, Tingting Wang, ”PaaS: A revolution for information technology platforms,” Educational and Network Technology (ICENT), 2010, pp. 346-349.
[8] Zeng Shu-Qing, Xu Jie-Bin, ”The Improvement of PaaS Platform,” Networking and Distributed Computing (ICNDC), 2010, pp. 156-159.
[9] Feng Liu, Weiping Guo, Zhi Qiang Zhao, Wu Chou,” SaaS Integration for Software Cloud” Cloud Computing (CLOUD), 2010, pp. 402-409.
[10] Jiehui Ju, Ya Wang, Jianqing Fu, Jiyi Wu, Zhijie Lin, “Research on Key Technology in SaaS,” Intelligent Computing and Cognitive Informatics (ICICCI), 2010, pp. 384-387.
[11] Guoling Liu, ”Research on independent SaaS platform,” Information Management and Engineering (ICIME), 2010, pp. 110-113.
[12] A. Muthitacharoen, B. Chen, and D. Mazieres, “A low-bandwidth network file system,” In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP'01), Canada, October 2001, pp 174-187.
[13] SUN Ji-zhong, MA Yong-qiang, LI Yu-hua, “Data Chunking Algorithm Based on Byte-fingerprint Extremum Characteristics,” School of Information Science and Technology, Southwest Jiaotong University, Chengdu, April 2010, pp. 36(8)
[14] Deepak R. Bobbarjung and Suresh Jagannathan, “Improving Duplicate Elimination in Storage Systems,” ACM Transactions on Storage - TOS, vol. 2, no. 4, 2006, pp. 424-448.
[15] Michael O. Rabin, “Fingerprinting by Random Polynomials,” Technical Report TR1581 Center for Research, 1981, pp. 15-18.
[16] Kave Eshghi, Hsiu Khuern Tang, ”A Framework for Analyzing and Improving Content-Based Chunking Algorithms,” Intelligent Enterprise Technologies Laboratory,HP Laboratories Palo Alto, Sep 2005.
[17] B.Liu, “Study of run-length coding algorithm,” JOURNAL OF TIANJIN INSTITUTE OF TECHNOLOGY, vo17. 4, 2001.
[18] Welch, T.A., ”A Technique for High-Performance DataCompression,” IEEE Computer, vo17. 6, 1984, pp. 8-19.
[19] Hu Yuanfu, Wu Xunsen, “The methods of improving the compression ratio of LZ77 family data compression algorithms,” Signal Processing, 3rd International, vol. 1996, pp.698-701.
[20] Chi-Hung Chi, ”Study on mult-lingual LZ77 and LZ78 text compression,” Data Compression Conference, Proceedings, Dept. of Information Systems and Computer Science National University of Singapore, 1998.
[21] Wei-ling Chang, Xiao-chun Yun, Bin-xing Fang, Shu-peng Wang, ”The Block LZSS Compression Algorithm,” Data Compression Conference, Mar 2009, pp. 439.
[22] Yazdanpanah, A., Hashemi, M.R., “A new compression ratio prediction algorithm for hardware implementations of LZW data compression,” Computer Architecture and Digital Systems (CADS), 15th CSI International Symposium, 2010, pp. 155-156.
[23] Huffman, D.A., ”A Method for the Construction of Minimum-Redundancy Codes,” Proceedings of the IRE, 1952, pp. 1098-1101.
[24] Connell, J.B., ”A Huffman-Shannon-Fano code” Proceedings of the IEEE, 1973, pp. 1046-1047.
[25] Jun Lu, Bin Du, Yi Zhu, DaiWei Li, ”MADFS: The Mobile Agent-Based Distributed Network File System,” Intelligent Systems,(GCIS), WRI Global Congress, 2009, pp. 68-74.
[26] Tobbicke, R., ”Distributed file systems: focus on Andrew File System/Distributed File Service (AFS/DFS),” Mass Storage Systems, ‘Towards Distributed Storage and Data Management Systems.’ First International Symposium. Proceedings., Thirteenth IEEE Symposium on, 1994, pp. 23-26.
[27] Lien, Yao-Nan, Xu, Hong-Qi, ”A UDP Based Protocol for Distributed P2P File Sharing,” Autonomous Decentralized Systems, ISADS Eighth International Symposium, 2007, pp. 318-324.
[28] Chuanyi Liu, Yingping Lu, Chunhui Shi, Guanlin Lu, Du, D.H.C., Dong-Sheng Wang, ”ADMAD:Application-Driven Metadata AwareDe-duplication Archival Storage System” Storage Network Architecture and Parallel I/Os, SNAPI Fifth IEEE International Workshop, 2008, pp. 29-35.
[29] Zhao Yong-Xia, Zhen Ge, ”MD5 Research,” Multimedia and Information Technology (MMIT), Second International Conference, 2010, pp. 271-273.
[30] 烘焙姬網路文化雜誌, 社大課程教材:電腦生活與網際網路介紹, 新竹市青草湖社區大學課程教材, july 2000.
論文使用權限
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2016-08-10公開。
  • 同意授權瀏覽/列印電子全文服務,於2021-08-09起公開。


  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2281 或 來信