§ 瀏覽學位論文書目資料
  
系統識別號 U0002-1309201816050300
DOI 10.6846/TKU.2018.00370
論文名稱(中文) 利用單域類神經網路與Reptile元學習之深度視覺追蹤
論文名稱(英文) Deep Visual Tracking using Single Domain Neural Network with Reptile Meta-Learning
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 電機工程學系機器人工程碩士班
系所名稱(英文) Master's Program In Robotics Engineering, Department Of Electrical And Computer Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 106
學期 2
出版年 107
研究生(中文) 張尚智
研究生(英文) Shang Jiji Jhang
學號 604470293
學位類別 碩士
語言別 英文
第二語言別
口試日期 2018-07-25
論文頁數 63頁
口試委員 指導教授 - 蔡奇謚(chiyi_tsai@gms.tku.edu.tw)
委員 - 黃志良(clhwang@mail.ntust.edu.tw)
委員 - 許駿飛(fei@ee.tku.edu.tw)
關鍵字(中) 深度學習
監督式學習
視覺追蹤
元學習
關鍵字(英) Deep Learning
Supervised learning
Visual Tracking
Meta Learning
第三語言關鍵字
學科別分類
中文摘要
視覺追蹤最主要的目的是在於一段連續的影像中,以邊界框的形式定位特定目標物體。雖然視覺追蹤在電腦視覺領域佔有重要角色已經很長一段時間,但是依然是個極具挑戰的問題,原因在於視覺追蹤要求定位特定物體,而非較為廣泛的物體類別,這對於以深度學習為基礎且需要線上學習之視覺追蹤演算法形成獨特的挑戰,雖然深度學習以其強大的辨識能力為名,但是若訓練資料非常少的情況下,深度學習演算法非常容易過擬合導致整體表現變得非常差。本論文以現有的深度學習追蹤演算法為基礎,加入了元學習演算法,使其在線上追蹤的初始化時,只需很少的更新次數與少量訓練資料即可表現優異,實驗結果顯示本演算法在OTB2015資料庫上獲得66.4%平均成功率。
英文摘要
The goal of visual tracking is to locate a specific object in the form of bounding box throughout a video or a sequence of images. While visual tracking has been one of the main topics in the field of computer vision for decades, it is still a very challenging topic. Visual tracking requires algorithms to recognize and locate objects down to instances level, and this requirement produces some unique challenges especially for some tracking algorithms based on deep learning techniques that require online leaning during the tracking process. Although deep leaning models could provide really strong and robust feature representation, it is easy to be over-fitted if given a really small set of training data thus making the overall performance throughout tracking poor. To deal with this issue, the proposed algorithm adopts first-order meta learning technique so that during initialization, the visual tracker only requires few training examples and few steps of optimization to perform well. Experiment results shows that it can achieve up to 66.4% of mean success rate on OTB2015 dataset.
第三語言摘要
論文目次
Table of Contents
Table of Contents	III
List of Figures	VI
List of Tables	VII
Chapter1 Introduction	1
1.1Background	1
1.2Motivation	4
1.3 Thesis structure	5
CHAPTER 2 Literature Review for Convolutional Neural Nets and Meta Learning	6
2.1 Feed Forward Neural Nets	8
2.2 Activation function	9
2.2.1 Sigmoid	9
2.2.2 Tanh	10
2.2.3 ReLU	11
2.2.4 Leaky ReLU	12
2.3 Optimization	13
2.3.1 Back propagation	13
2.3.2 Cost functions	15
2.3.3 Parameters update methods	16
2.4 ConvNet architectures	19
2.4.1 The convolution operation	20
2.4.2 Activation function	20
2.4.3 Pooling	21
2.4.3 Famous architectures	21
2.5 Meta Learning	24
2.5.1 Model-Agnostic Meta-Learning	26
2.5.2 Reptile	30
CHAPTER 3 Visual Tracking	32
3.1 Multi-Domain Network (MDNet)	32
3.1.1 Network Architecture	32
3.1.2 Offline Learning	33
3.1.3 Online Tracking	34
3.2 Meta-Tracker	38
3.2.1 Meta-SDNet	38
3.2.2 Meta-training	38
CHAPTER 4 The Proposed Method	40
4.1 Network architecture	40
4.2 Offline learning	41
4.3 Online Tracking	43
4.3.1 Target estimation	43
4.3.2 Bounding box regression	44
4.3.3 Parameters update	46
4.3.4 The tracking procedure	46
CHAPTER 5 Experiment Results	49
5.1 Hardware and software for experiments	49
5.2 Detail settings	50
5.3 Evaluation	50
5.3.1 Success plots comparison	51
5.3.2 Precision plots comparison	52
5.3.3 Comparison with different network architecture and setting based on the proposed offline and online learning	53
5.4 Tracking results	54
5.4.1 Basketball	54
5.4.2 Bird1	55
5.4.3 Board	55
5.4.4 ClifBar	56
5.4.5 DragonBaby	56
5.4.6 Football	57
5.4.7 Freeman4	57
5.4.8 Ironman	58
5.4.9 MotorRolling	58
5.4.10 Matirx	59
5.4.11 Skiing	59
Chapter 6 Conclusions and future work	60
Reference	61



List of Figures
Figure2.1: Basic structure of perceptron	7
Figure2.2: Structure of a feed forward neural net	8
Figure 2.3: the sigmoid function and its derivative.	10
Figure 2.4: the tanh function and its derivative	11
Figure 2.5: the ReLU function and its derivative	12
Figure2.6: the leaky ReLU and its derivative	13
Figure2.7: Example of backprop, forward pass values are indicated in green, gradient of each variable are indicated in red.	14
Figure2.8: The structure of LeNet	22
Figure2.9: The structure of AlexNet	22
Figure 2.10: The structure of VGGNet	23
Figure2.11: The Inception module	23
Figure 2.12: The structure of GoogLeNet	24
Figure 2.13: The residual block	24
Figure2.14: Meta learning data setup.	26
Figure3.1: MDNet architecture	33
Figure3.2: Complete online tracking procedure	37
Figure3.3: Offline meta-training	39
Figure 4.1: The proposed network architecture.	41
Figure4.2: Offline learning	43
Figure 4.3: The Online tracking procedure	48
Figure5.1: Success plots on OTB2015	52
Figure5.2: Precision plots on OTB2015	52
Figure5.3: Success plots comparison of proposed offline and online learning  method based on SDNet	53
Figure5.4: Precision plots comparison of proposed offline and online learning  method based on SDNet	54
Figure5.5: color code of different tracking results	54
Figure5.6: Tracking result for sequence Basketball	55
Figure5.7: Tracking result for sequence Bird1	55
Figure5.8: Tracking result for sequence Board	56
Figure5.9: Tracking result for sequence ClifBar	56
Figure5.10: Tracking result for sequence DragonBaby	57
Figure5.11: Tracking result for sequence Football	57
Figure5.12: Tracking result for sequence Freeman4	58
Figure5.12: Tracking result for sequence Ironman	58
Figure5.13: Tracking result for sequence MotorRolling	59
Figure5.14: Tracking results for sequence Matirx	59
Figure5.15: Tracking results for sequence Skiing	59



List of Tables
Table5.1: Hardware specification	49
Table5.2: Software specification	49
參考文獻
[1]	Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel: “Backpropagation applied to handwritten zip code recognition,” Neural Computation, vol. 1, no. 4, pp. 541–551, Winter 1989.
[2]	Krizhevsky, A., Sutskever, I., and Hinton, G. E.: “ImageNet classification with deep convolutional neural networks,” In NIPS, pp. 1106–1114, 2012. 
[3]	Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L.: “Imagenet: A large-scale hierarchical image database,” In CVPR, pp. 248-255, 2009.
[4]	C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich.: “Going deeper with convolutions.” In CVPR, pp. 1-9 2015.
[5]	He, K., Zhang, X., Ren, S., Sun, J. “Deep residual learning for image recognition.” In CVPR, pp. 770-778, 2016.  
[6]	B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman: “Building machines that learn and think like people,” Behavioral and Brain Sciences, 40, 2017. 
[7]	A Santoro, S Bartunov, M Botvinick, D Wierstra, and T Lillicrap: “Meta-learning with memory-augmented neural networks.”  In ICML, pp. 1842-1850, 2016. 
[8]	Kaiser, Ł., Nachum, O., Roy, A., & Bengio, S.: “Learning to remember rare events.” In ICLR, 2017. 
[9]	Oriol Vinyals, Charles Blundell, Tim Lillicrap, Daan Wierstra, et al.: “Matching networks for one shot learning.” In NIPS, pp. 3630-3638, 2016. 
[10]	Andrychowicz, Marcin, Denil, Misha, Gomez, Sergio, Hoffman, Matthew W, Pfau, David, Schaul, Tom, and de Freitas, Nando.: “Learning to learn by gradient descent by gradient descent.” In NIPS, 2016.
[11]	Ravi, S., & Larochelle, H.: “Optimization as a model for few-shot learning.” In ICLR, 2017
[12]	Finn, C., Abbeel, P., & Levine, S.:“Model-agnostic meta-learning for fast adaptation of deep networks.” In ICML, 2017.
[13]	Wang, L., Ouyang, W., Wang, X., & Lu, H. :“Visual tracking with fully convolutional networks.” In ICCV, 2015.
[14]	Hong, S., You, T., Kwak, S., & Han, B. :“Online tracking by learning discriminative saliency map with convolutional neural network.” In ICML, 2015.
[15]	Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., & Torr, P. H.: “Fully-convolutional siamese networks for object tracking.” In ECCV, 2016.
[16]	Hong, Z., Chen, Z., Wang, C., Mei, X., Prokhorov, D., & Tao, D.: “Multi-store tracker (muster): A cognitive psychology inspired approach to object tracking.” In CVPR, 2015.
[17]	Held, D., Thrun, S., & Savarese, S.:“Learning to track at 100 fps with deep regression networks.” In ECCV, 2016.
[18]	Nam, H., & Han, B.:“Learning multi-domain convolutional neural networks for visual tracking.” In CVPR, 2016.
[19]	Park, E., & Berg, A. C.:“Meta-Tracker: Fast and Robust Online Adaptation for Visual Object Trackers.” ECCV, 2018.
[20]	Nichol, A., & Schulman, J.:“Reptile: a Scalable Meta learning Algorithm.” arXiv:1803.02999, 2018.
[21]	Qian, N.:“On the momentum term in gradient descent learning algorithms.” Neural networks, pp. 145-151, 1999.
[22]	Duchi, J., Hazan, E., & Singer, Y.:“Adaptive subgradient methods for online learning and stochastic optimization.” JMLR, pp. 2121-2159, 2011.
[23]	Hinton, G., Srivastava, N., & Swersky, K. “Neural networks for machine learning lecture 6a overview of mini-batch gradient descent.” Cited on, 14, 2012.
[24]	Diederik P. Kingma and Jimmy Ba.:“Adam: A method for stochastic optimization.” In ICLR, 2015. 
[25]	K. Simonyan and A. Zisserman.: “Very deep convolutional networks for large-scale image recognition.” In ICLR, 2015.
[26]	K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman.:“Return of the devil in the details: Delving deep into convolutional nets.” In BMVC, 2014.
[27]	R. Girshick, J. Donahue, T. Darrell, and J. Malik.: “Rich feature hierarchies for accurate object detection and semantic segmentation.” In CVPR, 2014.
[28]	Song, Y., Ma, C., Gong, L., Zhang, J., Lau, R., Yang, M.H.: “CREST: Convolutional residual learning for visual tracking.” In ICCV, 2017.
[29]	Fan, H., & Ling, H.:“Parallel tracking and verifying: A framework for real-time and high accuracy visual tracking.” In ICCV, 2017. 
[30]	Danelljan, M., Hager, G., Shahbaz Khan, F., & Felsberg, M.: “Convolutional features for correlation filter based visual tracking.” In ICCVW, 2015.
[31]	Tao, R., Gavves, E., & Smeulders, A. W.: “Siamese instance search for tracking.” In CVPR, 2016.
[32]	Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., & Torr, P. H.: “Fully-convolutional siamese networks for object tracking.” In ICCV, 2016.
[33]	Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L. “ImageNet Large Scale Visual Recognition Challenge.” In IJCV, 2015.
[34]	M. Kristan, J. Matas, A. Leonardis, M. Felsberg, L. Cehovin, G. Fernandez, T. Vojir, G. Hager, G. Nebehay, R. Pflugfelder, ¨ et al.: “The visual object tracking VOT2015 challenge results.” In ICCVW, 2015.
論文全文使用權限
校內
紙本論文於授權書繳交後5年公開
同意電子論文全文授權校園內公開
校內電子論文於授權書繳交後5年公開
校外
同意授權
校外電子論文於授權書繳交後5年公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信