電子學位論文服務

§ 瀏覽學位論文書目資料

本論文紙本於2023-02-23起公開使用

系統識別號	U0002-2102202311270200
DOI	10.6846/TKU.2023.00105
論文名稱(中文)	設計及實作一個基於 BERT 與 3DCNN 深度學習模型的霸凌偵測系統
論文名稱(英文)	Designed Implementation of a Bullying Detection System Based on BERT and 3DCNN Deep Learning Models
第三語言論文名稱
校院名稱	淡江大學
系所名稱(中文)	資訊工程學系碩士班
系所名稱(英文)	Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度	111
學期	1
出版年	112
研究生(中文)	黃崇睿
研究生(英文)	Chung-Jui Huang
學號	610410077
學位類別	碩士
語言別	繁體中文
第二語言別
口試日期	2023-01-06
論文頁數	57頁
口試委員	指導教授 - 陳世興(shchen@mail.tku.edu.tw) 口試委員 - 張志勇口試委員 - 張義雄
關鍵字(中)	文字情緒辨識語音情緒辨識動作辨識 BERT MFCC LSTM CNN 3DCNN
關鍵字(英)	Text Emotion Recognition Speech Emotion Recognition Motion Recognition BERT MFCC LSTM CNN 3DCNN
第三語言關鍵字
學科別分類
中文摘要	近年來低年齡的學童在幼兒園受到霸凌或虐童的事件層出不窮，若遭受同儕霸凌還有老師可以阻止，但若霸凌或虐童者為老師的話，學童只有回家後告知家長，但教師霸凌及虐童事件已經發生，不能彌補學童已經傷害的心靈。根據兒童福利聯盟的研究，受到霸凌及虐童的學童會有情緒低落、自我感受自卑、信心消失……等影響，甚至會有自我殘害的行為發生或是更嚴重的影響。因此要預先防止學童遭受到霸凌，必須在被害者受到霸凌及虐童的當下制止，才能及時的防禦上述的憾事發生。隨著AI技術蓬勃發展，本論文擬透過文字情緒辨識、語音情緒辨識、動作辨識技術，完成一套即時霸凌監控系統，提升用幼兒園內的安全。本論文系統藉由文字情緒辨識、語音情緒辨識以及動作辨識的方式，使用自訓練的CNN深度學習模型、BERT深度學習模型、LSTM模型以及3DCNN模型，處理幼兒園監視器影像，並透過影像及語音來辨識幼兒園是否有任何霸凌事件的發生，另外，本論文將霸凌事件相關的資料進行記錄並使用Flask進行前後端的串接網頁方式且發出警告聲音通知家長以及教師。
英文摘要	In recent years, there have been numerous incidents of bullying and child abuse of younger children in kindergartens. If the bully is a peer, the teacher can stop it, but if the bully or abuser is a teacher, the child can only go home and inform the parents. According to a study by the Child Welfare League, children who are subjected to bullying and child abuse experience depressed emotions, low self-esteem, loss of confidence, and even self-harming behaviors or more serious effects. In order to prevent bullying in advance, it is important to stop the bullying and child abuse at the moment it occurs in order to prevent the above-mentioned unfortunate events from happening in a timely manner. With the booming of AI technology, this paper proposes to complete a real-time bullying surveillance system through text-emotion recognition, speech-emotion recognition, and motion recognition technologies to enhance the safety of kindergartens. This paper uses the self-trained CNN deep learning model, BERT deep learning model, LSTM model, and 3DCNN model to process the images of kindergarten monitors and identify whether there are any bullying incidents in the kindergarten through the images and voice. In addition, the data related to the bullying incidents are recorded and the front and back-end web pages are linked using Flask to send out warning sounds to parents and teachers.
第三語言摘要
論文目次	目錄目錄 IV 圖目錄 VI 表目錄 VIII 第一章簡介 1 第二章相關研究 7 2-1語音情緒辨識 7 2-2 文字情緒辨識 8 2-3 動作辨識 9 第三章背景知識 12 3-1語音情緒辨識 12 3-1-1 MFCC 12 3-1-2 CNN 13 3-2文字情緒辨識 15 3-2-1 BERT 15 3-3 動作辨識 16 3-3-1 3DCNN 16 3-3-2 Mediapipe 18 3-3-3 LSTM 18 3-4 Flask框架 20 第四章系統架構 21 4-1 問題描述 21 4-1-1 情境描述 21 4-1-2 目標 21 4-2系統架構 22 第五章實驗分析 43 5-1數據集 43 5-2 實驗結果 44 第六章結論 54 參考文獻 55 圖目錄圖1　 MFCC圖譜流程圖 13 圖2　 CNN模型流程圖 14 圖3 BERT模型架構圖 15 圖4 3DCNN模型流程圖 17 圖5　 LSTM模型架構圖 19 圖6　整體系統架構圖 22 圖7　霸凌語氣偵測系統架構圖 23 圖8 音訊降噪及頻率特徵提取流程圖 25 圖9　音訊正規化流程圖 26 圖10　MFCC特徵圖譜流程圖 28 圖11 霸凌語氣偵測系統之CNN模型訓練期 29 圖12　霸凌語氣偵測系統之CNN模型使用期 29 圖13　霸凌文字偵測系統架構圖 30 圖14　霸凌文字偵測系統之建立情緒詞庫 32 圖15　霸凌文字偵測之百萬生成資料詞庫 33 圖16　霸凌文字偵測之BERT模型訓練期 34 圖17　霸凌文字偵測之BERT模型訓練期 35 圖18　霸凌動作偵測之系統架構圖 36 圖19　霸凌動作偵測系統之滑動視窗生成百萬資料集 38 圖20　霸凌動作偵測系統之3DCNN模型訓練期 39 圖21　霸凌動作偵測系統之LSTM模型訓練期 40 圖22　霸凌動作偵測系統之3DCNN模型使用期 40 圖23　霸凌動作偵測系統之LSTM模型使用期 41 圖24　系統前端呈現 42 圖25　系統測試辨識正確率 44 圖26　霸凌語音系統整體正確率 45 圖27　霸凌語音系統分類正確率 46 圖28　霸凌文字系統整體正確率 46 圖29　霸凌文字系統分類正確率 47 圖30　霸凌語氣文字辨識正確率 48 圖31　霸凌動作模型整體正確率比較圖 48 圖32　霸凌動作模型分類正確率比較圖 49 圖33　霸凌動作模型整合辨識圖 50 圖34　霸凌文字偵測系統之ROC圖 52 圖35　霸凌動作偵測系統之ROC圖 53 圖36　霸凌語氣偵測系統之ROC圖 53 表目錄表1　相關研究比較表 11 表2　混淆矩陣表格 50 表3　真陽率及假陽率之公式 51
參考文獻	[1]Khalil, Ruhul Amin, et al. "Speech emotion recognition using deep learning techniques: A review." IEEE Access 7 (2019): 117327-117345. [2]Zhang, Shaohua, et al. "Spelling error correction with soft-masked BERT." arXiv preprint arXiv:2005.07421 (2020). [3]Tsai, Jen-Kai, et al. "Deep learning-based real-time multiple-person action recognition system." Sensors 20.17 (2020): 4758. [4]Sivasangari, A., P. Ajitha, and R. M. Gomathi. "Deep Learning-Based Real-Time Multiple-Person Action Recognition System." NVEO-NATURAL VOLATILES & ESSENTIAL OILS Journal\| NVEO (2021): 4464-4473. [5]Meng, Hao, et al. "Speech emotion recognition from 3D log-mel spectrograms with deep learning network,." IEEE access 7 (2019): 125868-125881. [6]Shelke, Nilesh, et al. "An efficient way of text-based emotion analysis from social media using LRA-DNN." ," Neuroscience Informatics (2022): 100048. [7]Vrskova, Roberta, et al. "Human activity classification using the 3DCNN architecture." ," Applied Sciences 12.2 (2022): 931. [8]Zhang, Lei, Shuai Wang, and Bing Liu. "Deep learning for sentiment analysis: A survey." ," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8.4 (2018): e1253. [9]Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." ," arXiv preprint arXiv:1810.04805 (2018). [10]Tenney, Ian, Dipanjan Das, and Ellie Pavlick. "BERT rediscovers the classical NLP pipeline." arXiv preprint arXiv:1905.05950 (2019). [11]Gao, Zhengjie, et al. "Target-dependent sentiment classification with BERT." Ieee IEEE Access 7 (2019): 154290-154299. [12]Zheng, Fang, Guoliang Zhang, and Zhanjiang Song. "Comparison of different implementations of MFCC." ," Journal of Computer science and Technology 16 (2001): 582-589. [13]Muda, Lindasalwa, Mumtaj Begam, and IrraivanElamvazuthi. "Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques." arXiv preprint arXiv:1003.4083 (2010). [14]Tiwari, Vibha. "MFCC and its applications in speaker recognition." ," International journal on emerging technologies 1.1 (2010): 19-22. [15]Chua, Leon O., and Tamas Roska. "The CNN paradigm." ," IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications 40.3 (1993): 147-156. [16]Girshick, Ross. "Fast r-cnn." ," Proceedings of the IEEE international conference on computer vision. 2015. [17]Alzubaidi, Laith, et al. "Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions." Journal of big Data 8 (2021): 1-74. [18]Zhang, Liang, et al. "Learning spatiotemporal features using 3dcnn and convolutional lstm for gesture recognition." Proceedings of the IEEE international conference on computer vision workshops. 2017. [19]Liu, Fangyu, et al. "3DCNN-DQN-RNN: A deep reinforcement learning framework for semantic parsing of large-scale 3D point clouds." Proceedings of the IEEE international conference on computer vision. 2017. [20]Titeca, Kristof. "The spiritual order of the LRA." The Lord's Resistance Army: myth and reality (2010): 59-73. [21]Lugaresi, Camillo, et al. "Mediapipe: A framework for building perception pipelines." arXiv preprint arXiv:1906.08172 (2019). [22]Yu, Yong, et al. "A review of recurrent neural networks: LSTM cells and network architectures." Neural computation 31.7 (2019): 1235-1270. [23]Huang, Zhiheng, Wei Xu, and Kai Yu. "Bidirectional LSTM-CRF models for sequence tagging." arXiv preprint arXiv:1508.01991 (2015). [24]Deng, Li, et al. "Recent advances in deep learning for speech research at Microsoft." 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 2013. [25]Snyder, David, et al. "X-vectors: Robust dnn embeddings for speaker recognition." 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2018. [26]Wu, Chung-Hsien, Ze-Jing Chuang, and Yu-Chung Lin. "Emotion recognition from text using semantic labels and separable mixture models." ACM transactions on Asian language information processing (TALIP) 5.2 (2006): 165-183. [27]Zhao, Rui, Anna Zhou, and Kezhi Mao. "Automatic detection of cyberbullying on social networks based on bullying features." Proceedings of the 17th international conference on distributed computing and networking. 2016. [28]El Ayadi, Moataz, Mohamed S. Kamel, and Fakhri Karray. "Survey on speech emotion recognition: Features, classification schemes, and databases." Pattern recognition 44.3 (2011): 572-587. [29]Ye, Liang, et al. "A combined motion-audio school bullying detection algorithm." International Journal of Pattern Recognition and Artificial Intelligence 32.12 (2018): 1850046. [30]Wei, Chuqiao, et al. "A school bullying detecting algorithm based on motion recognition and speech emotion recognition." 2020 International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI). IEEE, 2020. [31]Grinberg, Miguel. Flask web development: developing web applications with python. " O'Reilly Media, Inc.", 2018. [32]C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the inception architecture for computer vision," in Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, June. 2016, pp. 2818-2826
論文全文使用權限	國家圖書館：不同意無償授權國家圖書館，書目與全文電子檔於繳交授權書後, 於國家圖書館內獨立調閱設備立即公開校內：校內紙本論文立即公開電子論文全文不同意授權校內書目立即公開校外：不同意授權予資料庫廠商校外書目立即公開

返回頁首

如有問題，歡迎洽詢！
圖書館數位資訊組　(02)2621-5656 轉 2487 或來信