§ 瀏覽學位論文書目資料
  
系統識別號 U0002-1807201201535300
DOI 10.6846/TKU.2012.00749
論文名稱(中文) 基於MapReduce的影像處理系統加入DSRF優先權排程機制
論文名稱(英文) MapReduce-based Image Processing System with Priority-based DSRF Algorithm
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 電機工程學系碩士班
系所名稱(英文) Department of Electrical and Computer Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 100
學期 2
出版年 101
研究生(中文) 郭玲裳
研究生(英文) Ling Shang Kuo
學號 699470042
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2012-06-14
論文頁數 63頁
口試委員 指導教授 - 李維聰(wtlee@mail.tku.edu.tw)
委員 - 朱國志(kcchu@mail.lhu.edu.tw)
委員 - 衛信文(141131@mail.tku.edu.tw)
委員 - 劉豐豪(lfh123@gmail.com)
委員 - 吳庭育(tyw@mail.tku.edu.tw)
關鍵字(中) 3D影像
雲端系統
排程機制
關鍵字(英) 3D image
Cloud System
Schedule Algorithm
第三語言關鍵字
學科別分類
中文摘要
MapReduce是由Google所提出用於處理大量密集式資料的分散式平行運算架構。使用MapReduce的使用者只要撰寫Map以及Reduce這兩個Function,並輸入待處理的資料,即可以自動的完成需要大量運算資源的工作,如網頁搜尋、數值統計等。但是較少使用者將2D to 3D這種高複雜度且運算時間較長的影像處理應用放置於MapReduce處理,然而這種高複雜度以及高運算量的應用若是透過使用MapReduce來進行運算則可以用來增加其處理效能。
此外,在較早的文獻中,有許多的研究以及應用都被拿來建置在MapReduce上,但是這些研究以及應用所存在的環境需要擁有完整的運算資訊才能被正確地運行。而本論文的運作環境為一即時性的影像處理環境,所以如果沒有一個較佳的排程機制負責處理影像即時地被儲存在系統裡,將無法有效地將即時性的應用程序移植到MapReduce系統上。
本論文實作一個運作在MapReduce上的多使用者(Multi-user)的2D to 3D系統,並讓Map負責處理高複雜度以及高運算量的影像處理程序,而Reduce負責收集Map處理後的Intermediate Data並輸出。當多個使用者同時競爭MapReduce的資料來處理2D to 3D的應用時,就會造成等待Map處理的資料被正在處理的使用者所延誤,而Reduce必需等待所有Map處理完成後才能產生輸出結果的情形,。因此,本論文提出了一個新的排程機制,讓MapReduce能夠自動依照任務的處理進度自動切換下一個任務執行,並減少Reduce的閒置時間,我們稱之為Dynamic Switch of Reduce Function (DSRF) Algorithm。
英文摘要
MapReduce, a programming model proposed by Google, is designed for distributed parallel computing to process vast amounts of data. MapReduce users write the Map and Reduce functions, input the data to be processed and the task will be finished automatically. Hadoop, a distributed file system designed for implementing Google MapReduce, is adopted by many enterprises for daily data-intensive applications. Most users process short tasks using MapReduce; in other words, most tasks handled by the Map and Reduce functions require low response time. Currently, quite few users use MapReduce for 2D to 3D image processing, which is highly complicated and requires long execution time. However, in our opinion, MapReduce is exactly suitable for processing applications of high complexity and high computation.
The other researches use MapReduce to build their applications. In the above researches, they will store the complete data into their file system. In our paper, our system is a real-time image processing system and the file system will get the real-time image continually. By the way, the system doesn’t have a schedule algorithm to solve the real-time application problem. 
This paper implements MapReduce on an integrated 2D to 3D multi-user system, in which Map is responsible for image processing procedures of high complexity and high computation, and Reduce is responsible for integrating the intermediate data processed by Map for the final output. Different from short tasks, when several users compete simultaneously to acquire data from MapReduce for 2D to 3D applications, data that waits to be processed by Map will be delayed by the current user and Reduce has to wait until the completion of all Map tasks to generate the final result. Therefore, a novel scheduling scheme, Dynamic Switch of Reduce Function (DSRF) Algorithm.
第三語言摘要
論文目次
目錄
第一章	緒論	1
1.1	前言	1
1.2	動機與目的	1
1.3	論文章節架構	6
第二章	背景知識與相關研究	8
2.1	Google File System分散式檔案系統	8
2.1.1	BigTable	11
2.2	MapReduce	11
2.3	Hadoop	15
2.3.1	Hadoop MapReduce	17
2.3.1.1	JobNode	18
2.3.1.2	TaskNode	19
2.3.2	HDFS	20
2.3.2.1	NameNode	22
2.3.2.2	DataNode	22
2.3.3	HBase	23
2.3.4	ZooKeeper	25
2.4	影像處理程序	26
2.4.1	Harris Corner Detector	26
2.4.1.1	Moravec Corner Detection Algorithm	27
2.4.1.2	Harris Corner Detection Algorithm	28
2.4.2	Sum of Absolute Difference(SAD)	28
2.4.3	Model Reconstruction	29
2.5	相關研究	29
第三章  新的排程演算法DSRF Algorithm	33
3.1	DSRF(Dynamic Switch of Reduce Function) Algorithm	33
3.1.1	MapReduce System with 2D to 3D Application	33
3.1.2	DSRF(Dynamic Switch of Reduce Function) Algorithm Overview	36
3.2	DSRF Algorithm with MapReduce	39
3.3	Algorithm Working principle	41
第四章  模擬結果與討論	44
4.1	Implementation	44
4.1.1	Implementation Environment	44
4.1.2	Performance Analysis and Evaluation	46
第五章  結論與未來展望	60
參考文獻	62

圖目錄
圖2.1  MapReduce架構圖	14
圖2.2  Hadoop MapReduce 架構	18
圖2.3  JobNode & DataNode 關係圖	20
圖2.4  NameNode & DataNode 關係圖	23
圖2.5  影像骨架重建流程圖	26
圖2.6  Moravec Algorithm 示意圖	28
圖3.1  MapReduce Runtime System with 2D to 3D Application	34
圖3.2  DSRF Algorithm Overview	38
圖3.3  DSRF Algorithm Working 流程圖	39
圖4.1  DSRF in Hadoop	45
圖4.2  輸入圖片以及模擬圖片	46
圖4.3  User數量對平均運算時間的影響	48
圖4.4  User數量對平均運算時間的影響(放大)	48
圖4.5   User數量增加對DSRF Algorithm切換次數的影響	49
圖4.6  User數量增加對合併任務的延遲以及切換次數的關係	50
圖4.7  User數量增加對系統的輸出量影響	52
圖4.8  User數量增加對使用者平均等待時間的影響	53
圖4.9  User數量對系統最大排隊長度的影響(User Number = 10)	55
圖4.10  User數量對系統最大排隊長度的影響(User Number = 20)	55
圖4.11  User數量對系統最大排隊長度的影響(User Number = 30)	56
圖4.12  User數量對系統最大排隊長度的影響(User Number = 40)	56
圖4.13  User數量對系統最大排隊長度的影響(User Number = 50)	57
圖4.14  User數量對系統最大排隊長度的影響(User Number = 60)	58
圖4.15  User數量對系統最大排隊長度的影響(User Number = 70)	58
圖4.16  User數量對系統最大排隊長度的影響(User Number = 80)	59
參考文獻
郭玲裳,李維聰, “基於雲端系統Map-Reduce 模組加入資料緩衝區及優先權機制”, National Symposium on Telecommunication (NST), 18-19 
NovQinlu He,Zhanhuai Li’Xiao Zhang, “Study on Cloud Storage System based on Distributed Storage Systems”, Computational and Information Sciences( ICCIS), 17-19 Dec 2010, pp. 1332 - 1335 
Kevin D.Bowers,Ari Juels, and Alina Oprea., “HAIL: A HighAvailability and Integrity Layer for Cloud Storage”, Computer and communications security (CCS), Nov 2009, pp. 187-198 
Mingyue Luo,Gang Liu, “Distributed log information processing with Map-Reduce: A case study from raw data to final models”, Information Theory and Information Security(ICITIS), 17-19 Dec. 2010, pp.1143-1146
Afrati, Foto N. Ullman, Jeffrey D., “Optimizing Multiway Joins in a Map-Reduce Environment”, IEEE Transactions on Knowledge and Data Engineering, vol. 23, Issue. 9, Sept. 2011, pp. 1282-1298 
Chen Zhang ,De Sterck, H.,”CloudBATCH: A Batch Job Queuing System on Clouds with Hadoop and HBase”, Cloud Computing Technology and Science (CloudCom), Nov. 30 2010-Dec. 3 2010, pp. 368-375 
Jorda Polo, David Carrera, Yolanda Becerra, Malgorzata Steinder, and Ian Whalley., “Performance-driven task co-scheduling for mapreduce environments”, Network Operations and Management Symposium(NOMS), 19-23 Apr 2010, pp.373 –380
Thomas Sandholm and Kevin Lai., “Dynamic proportional share scheduling in hadoop”, Job Scheduling Stragies For Parallel Processing (JSSPP), vol. 6253, 2010, pp. 110-131 
Jia Yu and Rajkumar Buyya. “A budget constrained scheduling of workflow applications on utility grids using genetic algorithms”, High Performance Distributed Computing (HPDC), 19-23 June 2006, pp.1-10
Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy,Scott Shenker, and Ion Stoica, “Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling”, In EuroSys ’10:Proceedings of the 5th European conference on Computer systems, 2010, pp. 265–278. 
J. Ekanayake, S. Pallickara, and G. Fox, "MapReduce for Data Intensive Scientific Analyses." (ESCIENCE) ,2008, pp. 277-284. 
Qiu X., Ekanayake J., Gunarathne T. et al., "Using MapReduce Technologies in Bioinformatics and Medical Informatics." 
Lee, Kuan-Rong ; Fu, Meng-Hsuan ; Kuo, Yau-Hwang, “A hierarchical scheduling strategy for the composition services architecture based on cloud computing”, Next Generation Information Technology (ICNIT), 21-23 June 2011, pp. 163-169  
Thilina Gunarathne, Tak-Lon Wu, Judy Qiu et al., “Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications”, in Proceedings of the Emerging Computational Methods for the Life Sciences Workshop of ACM HPDC 2010 conference, vol.23, issue. 17, 27 JUN 2011, pp. 2338-2354
J. Ekanayake, T. Gunarathne, J. Qiu et al., “Cloud Technologies for Bioinformatics Applications”, IEEE Transactions on Parallel and Distributed Systems, vol. 22, issue. 6, June 2011, pp. 998-1011 
Chris Harris, Mike Stephens, “A Combined Corner and Edge Detector” Proceedings of 4^thAlvery Vision Conference, pp. 147-151
Moravec, H, “Obstacle Avoidance and Navigation in the Real World by a Seeing Robot Rover”, Tech Report CMU-RI-TR-3, Sep 1980
Sadasivam, G.S.; Selvaraj, D. “A novel parallel hybrid PSO-GA using MapReduce to schedule jobs in Hadoop data grids”, Nature and Biologically Inspired Computing (NaBIC) , 15-17 Dec. 2010 , pp. 377 - 382
P. R. Jelenkovic, X. Kang, J. Tan, “Adaptive and scalable comparison scheduling”, in: SIGMETRICS ’07: Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, 2007, pp. 215–226.
A. Wierman, M. Nuyens, “Scheduling despite inexact job-size information”, in: SIGMETRICS ’08: Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, 2008, pp. 25–36.
Apache Software Foundation, Hadoop on demand. URL http://hadoop.apache.org/core/docs/r0.20.0/hod_user_guide.html
M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, I. Stoica, “Job scheduling for multi-user MapReduce clusters”, Tech. Rep. UCB/EECS-2009-55,  Apr 2009
論文全文使用權限
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
同意授權
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信