§ 瀏覽學位論文書目資料
  
系統識別號 U0002-1707201917133000
DOI 10.6846/TKU.2019.00514
論文名稱(中文) 基於深度强化學習的動態自適應性影像串流系統
論文名稱(英文) A Dynamic Adaptive Streaming System based on Deep Reinforcement Learning
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 電機工程學系碩士班
系所名稱(英文) Department of Electrical and Computer Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 107
學期 2
出版年 108
研究生(中文) 倪富洋
研究生(英文) Fuh Yang Goay
學號 606455011
學位類別 碩士
語言別 英文
第二語言別 繁體中文
口試日期 2019-06-27
論文頁數 49頁
口試委員 指導教授 - 李維聰(wtlee@mail.tku.edu.tw)
委員 - 朱國志(kcchu@mail.lhu.edu.tw)
委員 - 李維聰(wtlee@mail.tku.edu.tw)
委員 - 衛信文(hwwei@mail.tku.edu.tw)
關鍵字(中) 基於HTTP的動態自適應性串流系統
深度强化學習
用戶觀看體驗
關鍵字(英) DASH
Deep Reinforcement Learning
第三語言關鍵字
學科別分類
中文摘要
近年來隨著科技的快速發展,智能手機等各類電子產品已逐漸普及化,現代的人們無論大人亦或者是小孩都擁有智能手機。隨著智能手機的崛起,觀看網路影片已然成爲了人們的日常,由於智能手機的方便携帶性,人們能夠隨時隨地的使用智能手機來觀看影片,然而人們在觀看影片時往往會遇到網路狀態不佳而導致影片播放卡頓亦或者模糊。針對此情況許多研究提出了解決方案如自適應性串流(adaptive bitrate streaming)。自適應性串流是一種串流技術,過去大多使用在RTP/RTSP,但現在大多基於HTTP [1]。此技術根據用戶的頻寬和CPU效能來調整影片的質量。這需要使用可以在不同碼率來回切換的播放器來爲用戶提供良好的觀看體驗。
其中基於HTTP的動態自適應性串流系統(Dynamic adaptive streaming system over HTTP;DASH)就是以自適應性串流為架構的成功案例。它使得高質量影片可以透過傳統的HTTP網路服務器以互聯網的方式傳輸。DASH會將影片内容分解為一系列基於HTTP的小型文件,並製成多種bitrate的備選片段,以提供多種影片質量來應變不斷變化的網路狀況并提高觀看體驗。雖然DASH架構能夠降低卡頓或重新緩存的發生頻率,但其效能是根據制定好自適應性串流規則。只有良好的自適應性串流規則搭配DASH架構才能提供良好的觀看體驗。
因此在本論文中,我們針對如何優化自適應性串流規則並提高網路環境不佳時的用戶觀看體驗問題進行探討。隨著近年來機器學習的迅速發展,在我們的構想中,我們使用深度强化學習的優化特質來改善適應性串流規則。深度强化學習是一種使用深度學習技術拓展傳統强化學習方法的一種機械學習。其繼承了傳統强化學習的最優解特性以及深度學習的化繁爲簡的特性,使得深度强化學習能夠處理複雜的狀況。在本論文中,我們將會把深度强化學習融入自適應性串流規則當中,我們的目的是要制定出一個最優解的自適應性串流規則來提升網路環境不佳時的用戶觀看體驗。因此首先我們需要把各種播放環境以及獎勵定義完整,并且使用不同影片性質來進行訓練以尋找最合適的訓練模型。透過訓練好的最優解自適應性串流規則,我們能夠提高網路環境不佳時的影片卡頓及模糊的問題,並且可以提升用戶的觀看體驗。
英文摘要
Recent years, with the rapid development of technology, smartphones have become popular. Many families no matter adults or children will have their own smartphone. With the popularity of smartphones, watching online movies has become daily life. Due to the portability of smartphones, people can use their smartphones to watch videos anytime, anywhere. However, people often encounter the same problem when watching videos during bad network condition. The video lagging or blur during bad network condition. Many studies have proposed solutions such as adaptive bitrate streaming for this situation. Adaptive streaming is a streaming technology that used to be used in RTP/RTSP, but is now mostly used on HTTP [1]. This technology adjusts the quality of the movie based on the user's bandwidth and CPU performance. This requires a player that can switch the video at different bitrates to provide a good quality of experience for the user.
The Dynamic Adaptive Streaming system over HTTP (DASH) is one of the successful examples of adaptive streaming. It enables high quality movies to be transmitted over the Internet via traditional HTTP web servers. DASH breaks down the video content into a series of small HTTP-based files and makes a different bitrate of media segment to provide different movie quality based on network conditions. This enhances the quality of experience of the video user. Although the DASH architecture can reduce the frequency of lagging or re-buffer, the performance of DASH architecture is based on the development of adaptive streaming rules (ABRrule). Only good adaptive streaming rules with DASH architecture can provide a good quality of experience.
Therefore, in this paper, we will discuss how to optimize the adaptive streaming rules and improve the quality of experience for the user during a bad network. The optimization characteristics of deep reinforcement learning are suitable for us to improve adaptive streaming rules. Deep reinforcement learning is a kind of mechine learning that uses deep learning techniques to expand traditional reinforcement learning methods. It inherits the optimal solution characteristics of traditional reinforcement learning and the simplification of deep learning, which enables deep reinforcement learning to deal with complex situations. In this paper, we will use deep reinforcement learning to create adaptive streaming rules. Our goal is to develop an adaptive streaming rule that can improve user quality of experience during bad network condition. First, we need to define the environments and rewards for the deep reinforcement learning and use different property type of video to train and find the most suitable training model. Through the complete training adaptive streaming rules, we can solve the improve the problem of video lagging or rebuffer during bad network condition, and able to enhance the user quality of experience.
第三語言摘要
論文目次
Contents

致謝 i
中文摘要 ii
英文摘要 iv
Chapter 1 Introduction	1
1.1 Motivation	1
1.2	Thesis Organization	3
Chapter 2 Related Work	4
2.1 Dynamic Adaptive Streaming over HTTP (DASH)	4
2.2 Media Presentation Description (MPD)	6
2.3 I-P-B Frames	7
2.4 Markov Decision Process	7
2.5 Deep Reinforcement learning	9
Chapter 3 The DRL Based algorithm For DASH	11
3.1 System Architecture	11
3.2 DRL adaptation algorithm	13
3.3 State and Reward	15
3.4 QoE Evaluation	19
3.5 Overall Process	22
Chapter 4 Simulation results	24
4.1 Simulation Setting	24
4.2 Learning model by threshold	29
4.3 Learning model by time	38
Chapter 5 Conclusion and future work	46
Reference	47

 
List of Figures
Figure 1.1 Internet traffic in recent years	1
Figure 2.1 Typical Structure of DASH system	4
Figure 2.2 Work process of the DASH adaptation system	5
Figure 2.3 Media Presentation Description data model	6
Figure 2.4 Illustration of I-P-B frames	7
Figure 2.5 example of neural network with one hidden layer	9
Figure 2.6 Reinforcement learning basic concept	10
Figure 3.1. Structure of the DASH system	12
Figure 3.2.Schematic diagram of DRL	14
Figure 3.3 The overall process of the system	22
Figure 4.1 Characteristic of Steady video	25
Figure 4.1.1. Characteristic of Moving video	26
Figure 4.1.2 Characteristic of Burst video	27
Figure 4.2 Reward during learning phase	29
Figure 4.2.1 Steady baseline video	31
Figure 4.2.2 Moving baseline video	31
Figure 4.2.3 Burst baseline video	31
Figure 4.2.4 Testing baseline video	31
Figure 4.2.5 Steady model with steady video	32
Figure 4.2.6 Moving model with steady video	32
Figure 4.2.7 Burst model with steady video	32
Figure 4.2.8 Steady model with moving video	32
Figure 4.2.9 Moving model with moving video	33
Figure 4.2.10 Burst model with moving video	33
Figure 4.2.11 Steady model with Burst video	33
Figure 4.2.12 Moving model with Burst video	33
Figure 4.2.13 Burst model with Burst video	34
Figure 4.2.14 Steady model with Testing video	34
Figure 4.2.15 Moving model with Testing video	34
Figure 4.2.16 Burst model with Testing video	34
Figure 4.2.17 QoE of the threshold model	37
Figure 4.3 Reward during learning phase	38
Figure 4.3.1 Steady model with steady video	40
Figure 4.3.2 Moving model with steady video	40
Figure 4.3.3 Burst model with steady video	40
Figure 4.3.4 Steady model with moving video	40
Figure 4.3.5 Moving model with moving video	41
Figure 4.3.6 Burst model with moving video	41
Figure 4.3.7 Steady model with burst video	41
Figure 4.3.8 Moving model with burst video	41
Figure 4.3.9 Burst model with burst video	42
Figure 4.3.10 Steady model with testing video	42
Figure 4.3.11 Moving model with testing video	42
Figure 4.3.12 Burst model with testing video	42
Figure 4.3.13 QoE of the time model	44




 
List of Tables

Table 3.1. Reward Function	16
Table 4.2 QoE value for different type of video with different model	35
Table 4.3 QoE value for different type of video with different model	43
參考文獻
[1] Santos-González I., Rivero-García A., Molina-Gil J., Caballero-Gil P. Implementation and Analysis of Real-Time Streaming Protocols. Sensors. 2017;17:846. doi: 10.3390/s17040846.
[2] Cisco,"Cisco Visual Networking Index: Forecast and Trends, 2017-2022," Cisco Public Information, 2018.
[3] Dynamic Adaptive Streaming Over HTTP (DASH)—Part 1: Media Presentation Description and Segment Formats, ISO/IEC Standard 23009-1:2014, May 2014.
[4] DASH (https://github.com/Dash-Industry-Forum/dash.js/wiki)
[5] M. Seufert, S. Egger, M. Slanina, T. Zinner, T. Hobfeld, and P. Tran-Gia, "A survey on quality of experience of http adaptive streaming," Communications Surveys & Tutorials, IEEE, vol. 17, no. 1, pp. 469-492, 2014.
[6] Federico Chiariotti. (2015). “Reinforcement learning algorithms for DASH video streaming”. University of Padova, Padua, Italy pp 14-30.
[7] Thomas Stockhammer,Nomor Research, Munich, Germany. "Dynamic adaptive streaming over HTTP: standards and design principles" San Jose, CA, USA — February 23 - 25, 2011
[8] x264 (https://www.videolan.org/developers/x264.html)
[9] MP4Box (https://gpac.wp.imt.fr/mp4box/)
[10] Onur Uzun.I-P-B Frames.Retrieved from https://medium.com/@nonuruzun/i-p-b-frames-b6782bcd1460
[11] R. Bellman, “A Markovian decision process,” Indiana Univ. Math. J.,vol. 6, no. 4, pp. 679–684, 1957
[12] C. Zhou, C.-W. Lin, and Z. Guo, “DASH: A Markov decision-based rate adaptation approach for dynamic HTTP streaming,” IEEE Trans. Multimedia, vol. 18, no. 4, pp. 738–751, Apr. 2016.
[13] CHRISTOPHER J.C.H. WATKINS, PETER DAYAN. "Q-Learning" Machine Learning, 8, 279-292 (1992).
[14] Virginia Martín, Julián Cabrera, Narciso García."Q-learning based control algorithm for HTTP adaptive streaming" 2015 Visual Communications and Image Processing (VCIP),IEEE,25 April 2016
[15] Dhananjay Kumar, Aswini V., Arun Raj L., and Hiran Kumar S"MACHINE LEARNING APPROACH FOR QUALITY ADAPTATION OF STREAMING VIDEO THROUGH 4G WIRELESS NETWORK OVER HTTP" 2017 ITU Kaleidoscope: Challenges for a Data-Driven Society (ITU K), IEEE, 08 January 2018
[16] Koffka Khan,Wayne Goodridge."Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)".Int. J. Advanced Networking and Applications Volume: 09 Issue: 03 Pages: 3461-3468 (2017) ISSN: 0975-0290
[18] P. Juluri, V. Tamarapalli, and D. Medhi, “QoE management in DASHsystems using the segment aware rate adaptation algorithm,” in Proc.IEEE/IFIP Netw. Oper. Manag. Symp. (NOMS), Istanbul, Turkey, 2016, pp. 129–136.
[17] Vincent François-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare and Joelle Pineau (2018), “An Introduction to Deep Reinforcement Learning”, Foundations and Trends in Machine Learning: Vol. 11, No. 3-4. DOI: 10.1561/2200000071.
[18] REINFORCEjs (https://cs.stanford.edu/people/karpathy/reinforcejs/)
[19] Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. "Deep Reinforcement Learning: A Brief Survey"  IEEE Signal Processing Magazine 2017 pp 27-28
[20] Matteo Gadaleta, Federico Chiariotti, Michele Rossi, Andrea Zanella."D-DASH: A Deep Q-Learning Framework for DASH Video Streaming" IEEE Transactions on Cognitive Communications and Networking ( Volume: 3 , Issue: 4 , Dec. 2017 )
[21] Koffka Khan, Wayne Goodridge. "QoE in DASH" Int.J.Advanced Networking and Applications.Volume:09 Issue: 04 Pages: 3515-3522(2018) ISSN:0975-0290
[22] Florin Dobrian,Asad Awan, Dilip Joseph,Aditya Ganjam, Jibin Zhan,Vyas Sekar,Ion Stoica,Hui Zhang."Understanding the Impact of Video Quality on User Engagement"SIGCOMM '11 Proceedings of the ACM SIGCOMM 2011 conference Pages 362-373
[23] Li Wenjing, Yu Peng, Wang Ruiyi, Feng Lei, Dong Ouzhou, Qiu Xuesong. "Quality of experience evaluation of HTTP video streaming based on user interactive behaviors ".The Journal of China Universities of Posts and Telecommunications.Volume 24, Issue 3, June 2017, Pages 24-32
[24] Nathan F.Lepora."Thold Learning for Optimal Decision Making".Department of Engineering Mathematics, University of Bristol, UK
論文全文使用權限
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
同意授權
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信