系統識別號 | U0002-1309201816050300 |
---|---|
DOI | 10.6846/TKU.2018.00370 |
論文名稱(中文) | 利用單域類神經網路與Reptile元學習之深度視覺追蹤 |
論文名稱(英文) | Deep Visual Tracking using Single Domain Neural Network with Reptile Meta-Learning |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 電機工程學系機器人工程碩士班 |
系所名稱(英文) | Master's Program In Robotics Engineering, Department Of Electrical And Computer Engineering |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 106 |
學期 | 2 |
出版年 | 107 |
研究生(中文) | 張尚智 |
研究生(英文) | Shang Jiji Jhang |
學號 | 604470293 |
學位類別 | 碩士 |
語言別 | 英文 |
第二語言別 | |
口試日期 | 2018-07-25 |
論文頁數 | 63頁 |
口試委員 |
指導教授
-
蔡奇謚(chiyi_tsai@gms.tku.edu.tw)
委員 - 黃志良(clhwang@mail.ntust.edu.tw) 委員 - 許駿飛(fei@ee.tku.edu.tw) |
關鍵字(中) |
深度學習 監督式學習 視覺追蹤 元學習 |
關鍵字(英) |
Deep Learning Supervised learning Visual Tracking Meta Learning |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
視覺追蹤最主要的目的是在於一段連續的影像中,以邊界框的形式定位特定目標物體。雖然視覺追蹤在電腦視覺領域佔有重要角色已經很長一段時間,但是依然是個極具挑戰的問題,原因在於視覺追蹤要求定位特定物體,而非較為廣泛的物體類別,這對於以深度學習為基礎且需要線上學習之視覺追蹤演算法形成獨特的挑戰,雖然深度學習以其強大的辨識能力為名,但是若訓練資料非常少的情況下,深度學習演算法非常容易過擬合導致整體表現變得非常差。本論文以現有的深度學習追蹤演算法為基礎,加入了元學習演算法,使其在線上追蹤的初始化時,只需很少的更新次數與少量訓練資料即可表現優異,實驗結果顯示本演算法在OTB2015資料庫上獲得66.4%平均成功率。 |
英文摘要 |
The goal of visual tracking is to locate a specific object in the form of bounding box throughout a video or a sequence of images. While visual tracking has been one of the main topics in the field of computer vision for decades, it is still a very challenging topic. Visual tracking requires algorithms to recognize and locate objects down to instances level, and this requirement produces some unique challenges especially for some tracking algorithms based on deep learning techniques that require online leaning during the tracking process. Although deep leaning models could provide really strong and robust feature representation, it is easy to be over-fitted if given a really small set of training data thus making the overall performance throughout tracking poor. To deal with this issue, the proposed algorithm adopts first-order meta learning technique so that during initialization, the visual tracker only requires few training examples and few steps of optimization to perform well. Experiment results shows that it can achieve up to 66.4% of mean success rate on OTB2015 dataset. |
第三語言摘要 | |
論文目次 |
Table of Contents Table of Contents III List of Figures VI List of Tables VII Chapter1 Introduction 1 1.1Background 1 1.2Motivation 4 1.3 Thesis structure 5 CHAPTER 2 Literature Review for Convolutional Neural Nets and Meta Learning 6 2.1 Feed Forward Neural Nets 8 2.2 Activation function 9 2.2.1 Sigmoid 9 2.2.2 Tanh 10 2.2.3 ReLU 11 2.2.4 Leaky ReLU 12 2.3 Optimization 13 2.3.1 Back propagation 13 2.3.2 Cost functions 15 2.3.3 Parameters update methods 16 2.4 ConvNet architectures 19 2.4.1 The convolution operation 20 2.4.2 Activation function 20 2.4.3 Pooling 21 2.4.3 Famous architectures 21 2.5 Meta Learning 24 2.5.1 Model-Agnostic Meta-Learning 26 2.5.2 Reptile 30 CHAPTER 3 Visual Tracking 32 3.1 Multi-Domain Network (MDNet) 32 3.1.1 Network Architecture 32 3.1.2 Offline Learning 33 3.1.3 Online Tracking 34 3.2 Meta-Tracker 38 3.2.1 Meta-SDNet 38 3.2.2 Meta-training 38 CHAPTER 4 The Proposed Method 40 4.1 Network architecture 40 4.2 Offline learning 41 4.3 Online Tracking 43 4.3.1 Target estimation 43 4.3.2 Bounding box regression 44 4.3.3 Parameters update 46 4.3.4 The tracking procedure 46 CHAPTER 5 Experiment Results 49 5.1 Hardware and software for experiments 49 5.2 Detail settings 50 5.3 Evaluation 50 5.3.1 Success plots comparison 51 5.3.2 Precision plots comparison 52 5.3.3 Comparison with different network architecture and setting based on the proposed offline and online learning 53 5.4 Tracking results 54 5.4.1 Basketball 54 5.4.2 Bird1 55 5.4.3 Board 55 5.4.4 ClifBar 56 5.4.5 DragonBaby 56 5.4.6 Football 57 5.4.7 Freeman4 57 5.4.8 Ironman 58 5.4.9 MotorRolling 58 5.4.10 Matirx 59 5.4.11 Skiing 59 Chapter 6 Conclusions and future work 60 Reference 61 List of Figures Figure2.1: Basic structure of perceptron 7 Figure2.2: Structure of a feed forward neural net 8 Figure 2.3: the sigmoid function and its derivative. 10 Figure 2.4: the tanh function and its derivative 11 Figure 2.5: the ReLU function and its derivative 12 Figure2.6: the leaky ReLU and its derivative 13 Figure2.7: Example of backprop, forward pass values are indicated in green, gradient of each variable are indicated in red. 14 Figure2.8: The structure of LeNet 22 Figure2.9: The structure of AlexNet 22 Figure 2.10: The structure of VGGNet 23 Figure2.11: The Inception module 23 Figure 2.12: The structure of GoogLeNet 24 Figure 2.13: The residual block 24 Figure2.14: Meta learning data setup. 26 Figure3.1: MDNet architecture 33 Figure3.2: Complete online tracking procedure 37 Figure3.3: Offline meta-training 39 Figure 4.1: The proposed network architecture. 41 Figure4.2: Offline learning 43 Figure 4.3: The Online tracking procedure 48 Figure5.1: Success plots on OTB2015 52 Figure5.2: Precision plots on OTB2015 52 Figure5.3: Success plots comparison of proposed offline and online learning method based on SDNet 53 Figure5.4: Precision plots comparison of proposed offline and online learning method based on SDNet 54 Figure5.5: color code of different tracking results 54 Figure5.6: Tracking result for sequence Basketball 55 Figure5.7: Tracking result for sequence Bird1 55 Figure5.8: Tracking result for sequence Board 56 Figure5.9: Tracking result for sequence ClifBar 56 Figure5.10: Tracking result for sequence DragonBaby 57 Figure5.11: Tracking result for sequence Football 57 Figure5.12: Tracking result for sequence Freeman4 58 Figure5.12: Tracking result for sequence Ironman 58 Figure5.13: Tracking result for sequence MotorRolling 59 Figure5.14: Tracking results for sequence Matirx 59 Figure5.15: Tracking results for sequence Skiing 59 List of Tables Table5.1: Hardware specification 49 Table5.2: Software specification 49 |
參考文獻 |
[1] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel: “Backpropagation applied to handwritten zip code recognition,” Neural Computation, vol. 1, no. 4, pp. 541–551, Winter 1989. [2] Krizhevsky, A., Sutskever, I., and Hinton, G. E.: “ImageNet classification with deep convolutional neural networks,” In NIPS, pp. 1106–1114, 2012. [3] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L.: “Imagenet: A large-scale hierarchical image database,” In CVPR, pp. 248-255, 2009. [4] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich.: “Going deeper with convolutions.” In CVPR, pp. 1-9 2015. [5] He, K., Zhang, X., Ren, S., Sun, J. “Deep residual learning for image recognition.” In CVPR, pp. 770-778, 2016. [6] B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman: “Building machines that learn and think like people,” Behavioral and Brain Sciences, 40, 2017. [7] A Santoro, S Bartunov, M Botvinick, D Wierstra, and T Lillicrap: “Meta-learning with memory-augmented neural networks.” In ICML, pp. 1842-1850, 2016. [8] Kaiser, Ł., Nachum, O., Roy, A., & Bengio, S.: “Learning to remember rare events.” In ICLR, 2017. [9] Oriol Vinyals, Charles Blundell, Tim Lillicrap, Daan Wierstra, et al.: “Matching networks for one shot learning.” In NIPS, pp. 3630-3638, 2016. [10] Andrychowicz, Marcin, Denil, Misha, Gomez, Sergio, Hoffman, Matthew W, Pfau, David, Schaul, Tom, and de Freitas, Nando.: “Learning to learn by gradient descent by gradient descent.” In NIPS, 2016. [11] Ravi, S., & Larochelle, H.: “Optimization as a model for few-shot learning.” In ICLR, 2017 [12] Finn, C., Abbeel, P., & Levine, S.:“Model-agnostic meta-learning for fast adaptation of deep networks.” In ICML, 2017. [13] Wang, L., Ouyang, W., Wang, X., & Lu, H. :“Visual tracking with fully convolutional networks.” In ICCV, 2015. [14] Hong, S., You, T., Kwak, S., & Han, B. :“Online tracking by learning discriminative saliency map with convolutional neural network.” In ICML, 2015. [15] Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., & Torr, P. H.: “Fully-convolutional siamese networks for object tracking.” In ECCV, 2016. [16] Hong, Z., Chen, Z., Wang, C., Mei, X., Prokhorov, D., & Tao, D.: “Multi-store tracker (muster): A cognitive psychology inspired approach to object tracking.” In CVPR, 2015. [17] Held, D., Thrun, S., & Savarese, S.:“Learning to track at 100 fps with deep regression networks.” In ECCV, 2016. [18] Nam, H., & Han, B.:“Learning multi-domain convolutional neural networks for visual tracking.” In CVPR, 2016. [19] Park, E., & Berg, A. C.:“Meta-Tracker: Fast and Robust Online Adaptation for Visual Object Trackers.” ECCV, 2018. [20] Nichol, A., & Schulman, J.:“Reptile: a Scalable Meta learning Algorithm.” arXiv:1803.02999, 2018. [21] Qian, N.:“On the momentum term in gradient descent learning algorithms.” Neural networks, pp. 145-151, 1999. [22] Duchi, J., Hazan, E., & Singer, Y.:“Adaptive subgradient methods for online learning and stochastic optimization.” JMLR, pp. 2121-2159, 2011. [23] Hinton, G., Srivastava, N., & Swersky, K. “Neural networks for machine learning lecture 6a overview of mini-batch gradient descent.” Cited on, 14, 2012. [24] Diederik P. Kingma and Jimmy Ba.:“Adam: A method for stochastic optimization.” In ICLR, 2015. [25] K. Simonyan and A. Zisserman.: “Very deep convolutional networks for large-scale image recognition.” In ICLR, 2015. [26] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman.:“Return of the devil in the details: Delving deep into convolutional nets.” In BMVC, 2014. [27] R. Girshick, J. Donahue, T. Darrell, and J. Malik.: “Rich feature hierarchies for accurate object detection and semantic segmentation.” In CVPR, 2014. [28] Song, Y., Ma, C., Gong, L., Zhang, J., Lau, R., Yang, M.H.: “CREST: Convolutional residual learning for visual tracking.” In ICCV, 2017. [29] Fan, H., & Ling, H.:“Parallel tracking and verifying: A framework for real-time and high accuracy visual tracking.” In ICCV, 2017. [30] Danelljan, M., Hager, G., Shahbaz Khan, F., & Felsberg, M.: “Convolutional features for correlation filter based visual tracking.” In ICCVW, 2015. [31] Tao, R., Gavves, E., & Smeulders, A. W.: “Siamese instance search for tracking.” In CVPR, 2016. [32] Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., & Torr, P. H.: “Fully-convolutional siamese networks for object tracking.” In ICCV, 2016. [33] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L. “ImageNet Large Scale Visual Recognition Challenge.” In IJCV, 2015. [34] M. Kristan, J. Matas, A. Leonardis, M. Felsberg, L. Cehovin, G. Fernandez, T. Vojir, G. Hager, G. Nebehay, R. Pflugfelder, ¨ et al.: “The visual object tracking VOT2015 challenge results.” In ICCVW, 2015. |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信