系統識別號 | U0002-2102202311270200 |
---|---|
DOI | 10.6846/TKU.2023.00105 |
論文名稱(中文) | 設計及實作一個基於 BERT 與 3DCNN 深度學習模型的霸凌偵測系統 |
論文名稱(英文) | Designed Implementation of a Bullying Detection System Based on BERT and 3DCNN Deep Learning Models |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 資訊工程學系碩士班 |
系所名稱(英文) | Department of Computer Science and Information Engineering |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 111 |
學期 | 1 |
出版年 | 112 |
研究生(中文) | 黃崇睿 |
研究生(英文) | Chung-Jui Huang |
學號 | 610410077 |
學位類別 | 碩士 |
語言別 | 繁體中文 |
第二語言別 | |
口試日期 | 2023-01-06 |
論文頁數 | 57頁 |
口試委員 |
指導教授
-
陳世興(shchen@mail.tku.edu.tw)
口試委員 - 張志勇 口試委員 - 張義雄 |
關鍵字(中) |
文字情緒辨識 語音情緒辨識 動作辨識 BERT MFCC LSTM CNN 3DCNN |
關鍵字(英) |
Text Emotion Recognition Speech Emotion Recognition Motion Recognition BERT MFCC LSTM CNN 3DCNN |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
近年來低年齡的學童在幼兒園受到霸凌或虐童的事件層出不窮,若遭受同儕霸凌還有老師可以阻止,但若霸凌或虐童者為老師的話,學童只有回家後告知家長,但教師霸凌及虐童事件已經發生,不能彌補學童已經傷害的心靈。根據兒童福利聯盟的研究,受到霸凌及虐童的學童會有情緒低落、自我感受自卑、信心消失……等影響,甚至會有自我殘害的行為發生或是更嚴重的影響。因此要預先防止學童遭受到霸凌,必須在被害者受到霸凌及虐童的當下制止,才能及時的防禦上述的憾事發生。 隨著AI技術蓬勃發展,本論文擬透過文字情緒辨識、語音情緒辨識、動作辨識技術,完成一套即時霸凌監控系統,提升用幼兒園內的安全。本論文系統藉由文字情緒辨識、語音情緒辨識以及動作辨識的方式,使用自訓練的CNN深度學習模型、BERT深度學習模型、LSTM模型以及3DCNN模型,處理幼兒園監視器影像,並透過影像及語音來辨識幼兒園是否有任何霸凌事件的發生,另外,本論文將霸凌事件相關的資料進行記錄並使用Flask進行前後端的串接網頁方式且發出警告聲音通知家長以及教師。 |
英文摘要 |
In recent years, there have been numerous incidents of bullying and child abuse of younger children in kindergartens. If the bully is a peer, the teacher can stop it, but if the bully or abuser is a teacher, the child can only go home and inform the parents. According to a study by the Child Welfare League, children who are subjected to bullying and child abuse experience depressed emotions, low self-esteem, loss of confidence, and even self-harming behaviors or more serious effects. In order to prevent bullying in advance, it is important to stop the bullying and child abuse at the moment it occurs in order to prevent the above-mentioned unfortunate events from happening in a timely manner. With the booming of AI technology, this paper proposes to complete a real-time bullying surveillance system through text-emotion recognition, speech-emotion recognition, and motion recognition technologies to enhance the safety of kindergartens. This paper uses the self-trained CNN deep learning model, BERT deep learning model, LSTM model, and 3DCNN model to process the images of kindergarten monitors and identify whether there are any bullying incidents in the kindergarten through the images and voice. In addition, the data related to the bullying incidents are recorded and the front and back-end web pages are linked using Flask to send out warning sounds to parents and teachers. |
第三語言摘要 | |
論文目次 |
目錄 目錄 IV 圖目錄 VI 表目錄 VIII 第一章 簡介 1 第二章 相關研究 7 2-1語音情緒辨識 7 2-2 文字情緒辨識 8 2-3 動作辨識 9 第三章 背景知識 12 3-1語音情緒辨識 12 3-1-1 MFCC 12 3-1-2 CNN 13 3-2文字情緒辨識 15 3-2-1 BERT 15 3-3 動作辨識 16 3-3-1 3DCNN 16 3-3-2 Mediapipe 18 3-3-3 LSTM 18 3-4 Flask框架 20 第四章 系統架構 21 4-1 問題描述 21 4-1-1 情境描述 21 4-1-2 目標 21 4-2系統架構 22 第五章 實驗分析 43 5-1數據集 43 5-2 實驗結果 44 第六章 結論 54 參考文獻 55 圖目錄 圖1 MFCC圖譜流程圖 13 圖2 CNN模型流程圖 14 圖3 BERT模型架構圖 15 圖4 3DCNN模型流程圖 17 圖5 LSTM模型架構圖 19 圖6 整體系統架構圖 22 圖7 霸凌語氣偵測系統架構圖 23 圖8 音訊降噪及頻率特徵提取流程圖 25 圖9 音訊正規化流程圖 26 圖10 MFCC特徵圖譜流程圖 28 圖11 霸凌語氣偵測系統之CNN模型訓練期 29 圖12 霸凌語氣偵測系統之CNN模型使用期 29 圖13 霸凌文字偵測系統架構圖 30 圖14 霸凌文字偵測系統之建立情緒詞庫 32 圖15 霸凌文字偵測之百萬生成資料詞庫 33 圖16 霸凌文字偵測之BERT模型訓練期 34 圖17 霸凌文字偵測之BERT模型訓練期 35 圖18 霸凌動作偵測之系統架構圖 36 圖19 霸凌動作偵測系統之滑動視窗生成百萬資料集 38 圖20 霸凌動作偵測系統之3DCNN模型訓練期 39 圖21 霸凌動作偵測系統之LSTM模型訓練期 40 圖22 霸凌動作偵測系統之3DCNN模型使用期 40 圖23 霸凌動作偵測系統之LSTM模型使用期 41 圖24 系統前端呈現 42 圖25 系統測試辨識正確率 44 圖26 霸凌語音系統整體正確率 45 圖27 霸凌語音系統分類正確率 46 圖28 霸凌文字系統整體正確率 46 圖29 霸凌文字系統分類正確率 47 圖30 霸凌語氣文字辨識正確率 48 圖31 霸凌動作模型整體正確率比較圖 48 圖32 霸凌動作模型分類正確率比較圖 49 圖33 霸凌動作模型整合辨識圖 50 圖34 霸凌文字偵測系統之ROC圖 52 圖35 霸凌動作偵測系統之ROC圖 53 圖36 霸凌語氣偵測系統之ROC圖 53 表目錄 表1 相關研究比較表 11 表2 混淆矩陣表格 50 表3 真陽率及假陽率之公式 51 |
參考文獻 |
[1]Khalil, Ruhul Amin, et al. "Speech emotion recognition using deep learning techniques: A review." IEEE Access 7 (2019): 117327-117345. [2]Zhang, Shaohua, et al. "Spelling error correction with soft-masked BERT." arXiv preprint arXiv:2005.07421 (2020). [3]Tsai, Jen-Kai, et al. "Deep learning-based real-time multiple-person action recognition system." Sensors 20.17 (2020): 4758. [4]Sivasangari, A., P. Ajitha, and R. M. Gomathi. "Deep Learning-Based Real-Time Multiple-Person Action Recognition System." NVEO-NATURAL VOLATILES & ESSENTIAL OILS Journal| NVEO (2021): 4464-4473. [5]Meng, Hao, et al. "Speech emotion recognition from 3D log-mel spectrograms with deep learning network,." IEEE access 7 (2019): 125868-125881. [6]Shelke, Nilesh, et al. "An efficient way of text-based emotion analysis from social media using LRA-DNN." ," Neuroscience Informatics (2022): 100048. [7]Vrskova, Roberta, et al. "Human activity classification using the 3DCNN architecture." ," Applied Sciences 12.2 (2022): 931. [8]Zhang, Lei, Shuai Wang, and Bing Liu. "Deep learning for sentiment analysis: A survey." ," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8.4 (2018): e1253. [9]Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." ," arXiv preprint arXiv:1810.04805 (2018). [10]Tenney, Ian, Dipanjan Das, and Ellie Pavlick. "BERT rediscovers the classical NLP pipeline." arXiv preprint arXiv:1905.05950 (2019). [11]Gao, Zhengjie, et al. "Target-dependent sentiment classification with BERT." Ieee IEEE Access 7 (2019): 154290-154299. [12]Zheng, Fang, Guoliang Zhang, and Zhanjiang Song. "Comparison of different implementations of MFCC." ," Journal of Computer science and Technology 16 (2001): 582-589. [13]Muda, Lindasalwa, Mumtaj Begam, and IrraivanElamvazuthi. "Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques." arXiv preprint arXiv:1003.4083 (2010). [14]Tiwari, Vibha. "MFCC and its applications in speaker recognition." ," International journal on emerging technologies 1.1 (2010): 19-22. [15]Chua, Leon O., and Tamas Roska. "The CNN paradigm." ," IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications 40.3 (1993): 147-156. [16]Girshick, Ross. "Fast r-cnn." ," Proceedings of the IEEE international conference on computer vision. 2015. [17]Alzubaidi, Laith, et al. "Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions." Journal of big Data 8 (2021): 1-74. [18]Zhang, Liang, et al. "Learning spatiotemporal features using 3dcnn and convolutional lstm for gesture recognition." Proceedings of the IEEE international conference on computer vision workshops. 2017. [19]Liu, Fangyu, et al. "3DCNN-DQN-RNN: A deep reinforcement learning framework for semantic parsing of large-scale 3D point clouds." Proceedings of the IEEE international conference on computer vision. 2017. [20]Titeca, Kristof. "The spiritual order of the LRA." The Lord's Resistance Army: myth and reality (2010): 59-73. [21]Lugaresi, Camillo, et al. "Mediapipe: A framework for building perception pipelines." arXiv preprint arXiv:1906.08172 (2019). [22]Yu, Yong, et al. "A review of recurrent neural networks: LSTM cells and network architectures." Neural computation 31.7 (2019): 1235-1270. [23]Huang, Zhiheng, Wei Xu, and Kai Yu. "Bidirectional LSTM-CRF models for sequence tagging." arXiv preprint arXiv:1508.01991 (2015). [24]Deng, Li, et al. "Recent advances in deep learning for speech research at Microsoft." 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 2013. [25]Snyder, David, et al. "X-vectors: Robust dnn embeddings for speaker recognition." 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2018. [26]Wu, Chung-Hsien, Ze-Jing Chuang, and Yu-Chung Lin. "Emotion recognition from text using semantic labels and separable mixture models." ACM transactions on Asian language information processing (TALIP) 5.2 (2006): 165-183. [27]Zhao, Rui, Anna Zhou, and Kezhi Mao. "Automatic detection of cyberbullying on social networks based on bullying features." Proceedings of the 17th international conference on distributed computing and networking. 2016. [28]El Ayadi, Moataz, Mohamed S. Kamel, and Fakhri Karray. "Survey on speech emotion recognition: Features, classification schemes, and databases." Pattern recognition 44.3 (2011): 572-587. [29]Ye, Liang, et al. "A combined motion-audio school bullying detection algorithm." International Journal of Pattern Recognition and Artificial Intelligence 32.12 (2018): 1850046. [30]Wei, Chuqiao, et al. "A school bullying detecting algorithm based on motion recognition and speech emotion recognition." 2020 International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI). IEEE, 2020. [31]Grinberg, Miguel. Flask web development: developing web applications with python. " O'Reilly Media, Inc.", 2018. [32]C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the inception architecture for computer vision," in Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, June. 2016, pp. 2818-2826 |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信