§ 瀏覽學位論文書目資料
系統識別號 U0002-2102202311270200
DOI 10.6846/TKU.2023.00105
論文名稱(中文) 設計及實作一個基於 BERT 與 3DCNN 深度學習模型的霸凌偵測系統
論文名稱(英文) Designed Implementation of a Bullying Detection System Based on BERT and 3DCNN Deep Learning Models
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系碩士班
系所名稱(英文) Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 111
學期 1
出版年 112
研究生(中文) 黃崇睿
研究生(英文) Chung-Jui Huang
學號 610410077
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2023-01-06
論文頁數 57頁
口試委員 指導教授 - 陳世興(shchen@mail.tku.edu.tw)
口試委員 - 張志勇
口試委員 - 張義雄
關鍵字(中) 文字情緒辨識
語音情緒辨識
動作辨識
BERT
MFCC
LSTM
CNN
3DCNN
關鍵字(英) Text Emotion Recognition
Speech Emotion Recognition
Motion Recognition
BERT
MFCC
LSTM
CNN
3DCNN
第三語言關鍵字
學科別分類
中文摘要
近年來低年齡的學童在幼兒園受到霸凌或虐童的事件層出不窮,若遭受同儕霸凌還有老師可以阻止,但若霸凌或虐童者為老師的話,學童只有回家後告知家長,但教師霸凌及虐童事件已經發生,不能彌補學童已經傷害的心靈。根據兒童福利聯盟的研究,受到霸凌及虐童的學童會有情緒低落、自我感受自卑、信心消失……等影響,甚至會有自我殘害的行為發生或是更嚴重的影響。因此要預先防止學童遭受到霸凌,必須在被害者受到霸凌及虐童的當下制止,才能及時的防禦上述的憾事發生。
隨著AI技術蓬勃發展,本論文擬透過文字情緒辨識、語音情緒辨識、動作辨識技術,完成一套即時霸凌監控系統,提升用幼兒園內的安全。本論文系統藉由文字情緒辨識、語音情緒辨識以及動作辨識的方式,使用自訓練的CNN深度學習模型、BERT深度學習模型、LSTM模型以及3DCNN模型,處理幼兒園監視器影像,並透過影像及語音來辨識幼兒園是否有任何霸凌事件的發生,另外,本論文將霸凌事件相關的資料進行記錄並使用Flask進行前後端的串接網頁方式且發出警告聲音通知家長以及教師。
英文摘要
In recent years, there have been numerous incidents of bullying and child abuse of younger children in kindergartens. If the bully is a peer, the teacher can stop it, but if the bully or abuser is a teacher, the child can only go home and inform the parents. According to a study by the Child Welfare League, children who are subjected to bullying and child abuse experience depressed emotions, low self-esteem, loss of confidence, and even self-harming behaviors or more serious effects. In order to prevent bullying in advance, it is important to stop the bullying and child abuse at the moment it occurs in order to prevent the above-mentioned unfortunate events from happening in a timely manner.
With the booming of AI technology, this paper proposes to complete a real-time bullying surveillance system through text-emotion recognition, speech-emotion recognition, and motion recognition technologies to enhance the safety of kindergartens. This paper uses the self-trained CNN deep learning model, BERT deep learning model, LSTM model, and 3DCNN model to process the images of kindergarten monitors and identify whether there are any bullying incidents in the kindergarten through the images and voice. In addition, the data related to the bullying incidents are recorded and the front and back-end web pages are linked using Flask to send out warning sounds to parents and teachers.
第三語言摘要
論文目次
目錄
目錄		IV
圖目錄		VI
表目錄		VIII
第一章	簡介	1
第二章	相關研究	7
2-1語音情緒辨識	7
2-2 文字情緒辨識	8
2-3 動作辨識	9
第三章	背景知識	12
3-1語音情緒辨識	12
3-1-1 MFCC	12
3-1-2 CNN	13
3-2文字情緒辨識	15
3-2-1 BERT	15
3-3 動作辨識	16
3-3-1 3DCNN	16
3-3-2 Mediapipe	18
3-3-3 LSTM	18
3-4 Flask框架	20
第四章	系統架構	21
4-1 問題描述	21
4-1-1 情境描述	21
4-1-2 目標	21
4-2系統架構	22
第五章	實驗分析	43
5-1數據集	43
5-2 實驗結果	44
第六章	結論	54
參考文獻	55
圖目錄
圖1  MFCC圖譜流程圖	13
圖2  CNN模型流程圖	14
圖3   BERT模型架構圖	15
圖4   3DCNN模型流程圖	17
圖5  LSTM模型架構圖	19
圖6 整體系統架構圖	22
圖7 霸凌語氣偵測系統架構圖	23
圖8  音訊降噪及頻率特徵提取流程圖	25
圖9 音訊正規化流程圖	26
圖10 MFCC特徵圖譜流程圖	28
圖11  霸凌語氣偵測系統之CNN模型訓練期	29
圖12 霸凌語氣偵測系統之CNN模型使用期	29
圖13 霸凌文字偵測系統架構圖	30
圖14 霸凌文字偵測系統之建立情緒詞庫	32
圖15 霸凌文字偵測之百萬生成資料詞庫	33
圖16 霸凌文字偵測之BERT模型訓練期	34
圖17 霸凌文字偵測之BERT模型訓練期	35
圖18 霸凌動作偵測之系統架構圖	36
圖19 霸凌動作偵測系統之滑動視窗生成百萬資料集	38
圖20 霸凌動作偵測系統之3DCNN模型訓練期	39
圖21 霸凌動作偵測系統之LSTM模型訓練期	40
圖22 霸凌動作偵測系統之3DCNN模型使用期	40
圖23 霸凌動作偵測系統之LSTM模型使用期	41
圖24 系統前端呈現	42
圖25 系統測試辨識正確率	44
圖26 霸凌語音系統整體正確率	45
圖27 霸凌語音系統分類正確率	46
圖28 霸凌文字系統整體正確率	46
圖29 霸凌文字系統分類正確率	47
圖30 霸凌語氣文字辨識正確率	48
圖31 霸凌動作模型整體正確率比較圖	48
圖32 霸凌動作模型分類正確率比較圖	49
圖33 霸凌動作模型整合辨識圖	50
圖34 霸凌文字偵測系統之ROC圖	52
圖35 霸凌動作偵測系統之ROC圖	53
圖36 霸凌語氣偵測系統之ROC圖	53
表目錄
表1 相關研究比較表	11
表2 混淆矩陣表格	50
表3 真陽率及假陽率之公式	51
參考文獻
[1]Khalil, Ruhul Amin, et al. "Speech emotion recognition using deep learning techniques: A review." IEEE Access 7 (2019): 117327-117345.
[2]Zhang, Shaohua, et al. "Spelling error correction with soft-masked BERT." arXiv preprint arXiv:2005.07421 (2020).
[3]Tsai, Jen-Kai, et al. "Deep learning-based real-time multiple-person action recognition system." Sensors 20.17 (2020): 4758.
[4]Sivasangari, A., P. Ajitha, and R. M. Gomathi. "Deep Learning-Based Real-Time Multiple-Person Action Recognition System." NVEO-NATURAL VOLATILES & ESSENTIAL OILS Journal| NVEO (2021): 4464-4473.
[5]Meng, Hao, et al. "Speech emotion recognition from 3D log-mel spectrograms with deep learning network,." IEEE access 7 (2019): 125868-125881.
[6]Shelke, Nilesh, et al. "An efficient way of text-based emotion analysis from social media using LRA-DNN." ," Neuroscience Informatics (2022): 100048.
[7]Vrskova, Roberta, et al.  "Human activity classification  using the 3DCNN architecture." ," Applied Sciences 12.2 (2022): 931.
[8]Zhang, Lei, Shuai Wang, and Bing Liu. "Deep learning for sentiment analysis: A survey." ," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8.4 (2018): e1253.
[9]Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." ," arXiv preprint arXiv:1810.04805 (2018).
[10]Tenney, Ian, Dipanjan Das, and Ellie Pavlick. "BERT rediscovers the classical NLP pipeline." arXiv preprint arXiv:1905.05950 (2019).
[11]Gao, Zhengjie, et al. "Target-dependent sentiment classification with BERT." Ieee IEEE Access 7 (2019): 154290-154299.
[12]Zheng, Fang, Guoliang Zhang, and Zhanjiang Song. "Comparison of different implementations of MFCC." ," Journal of Computer science and Technology 16 (2001): 582-589.
[13]Muda, Lindasalwa, Mumtaj Begam, and IrraivanElamvazuthi. "Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques." arXiv preprint arXiv:1003.4083 (2010).
[14]Tiwari, Vibha. "MFCC and its applications in speaker recognition." ," International journal on emerging technologies 1.1 (2010): 19-22.
[15]Chua, Leon O., and Tamas Roska. "The CNN paradigm." ," IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications 40.3 (1993): 147-156.
[16]Girshick, Ross. "Fast r-cnn." ," Proceedings of the IEEE international conference on computer vision. 2015.
[17]Alzubaidi, Laith, et al. "Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions." Journal of big Data 8 (2021): 1-74.
[18]Zhang, Liang, et al. "Learning spatiotemporal features using 3dcnn and convolutional lstm for gesture recognition." Proceedings of the IEEE international conference on computer vision workshops. 2017.
[19]Liu, Fangyu, et al. "3DCNN-DQN-RNN: A deep reinforcement learning framework for semantic parsing of large-scale 3D point clouds." Proceedings of the IEEE international conference on computer vision. 2017.
[20]Titeca, Kristof. "The spiritual order of the LRA." The Lord's Resistance Army: myth and reality (2010): 59-73.
[21]Lugaresi, Camillo, et al. "Mediapipe: A framework for building perception pipelines." arXiv preprint arXiv:1906.08172 (2019).
[22]Yu, Yong, et al. "A review of recurrent neural networks: LSTM cells and network architectures." Neural computation 31.7 (2019): 1235-1270.
[23]Huang, Zhiheng, Wei Xu, and Kai Yu. "Bidirectional LSTM-CRF models for sequence tagging." arXiv preprint arXiv:1508.01991 (2015).
[24]Deng, Li, et al. "Recent advances in deep learning for speech research at Microsoft." 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 2013.
[25]Snyder,  David, et al. "X-vectors: Robust dnn embeddings for speaker recognition." 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2018.
[26]Wu, Chung-Hsien, Ze-Jing Chuang, and Yu-Chung Lin. "Emotion recognition from text using semantic labels and separable mixture models." ACM transactions on Asian language information processing (TALIP) 5.2 (2006): 165-183.
[27]Zhao, Rui, Anna Zhou, and Kezhi Mao. "Automatic detection of cyberbullying on social networks based on bullying features." Proceedings of the 17th international conference on distributed computing and networking. 2016.
[28]El Ayadi, Moataz, Mohamed S. Kamel, and Fakhri Karray. "Survey on speech emotion recognition: Features, classification schemes, and databases." Pattern recognition 44.3 (2011): 572-587.
[29]Ye, Liang, et al. "A combined motion-audio school bullying detection algorithm." International Journal of Pattern Recognition and Artificial Intelligence 32.12 (2018): 1850046.
[30]Wei, Chuqiao, et al. "A school bullying detecting algorithm based on motion recognition and speech emotion recognition." 2020 International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI). IEEE, 2020.
[31]Grinberg, Miguel. Flask web development: developing web applications with python. " O'Reilly Media, Inc.", 2018.
[32]C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the inception architecture for computer vision," in Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, June. 2016, pp. 2818-2826
論文全文使用權限
國家圖書館
不同意無償授權國家圖書館,書目與全文電子檔於繳交授權書後, 於國家圖書館內獨立調閱設備立即公開
校內
校內紙本論文立即公開
電子論文全文不同意授權
校內書目立即公開
校外
不同意授權予資料庫廠商
校外書目立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信