§ 瀏覽學位論文書目資料
  
系統識別號 U0002-3012202010155000
DOI 10.6846/TKU.2021.00858
論文名稱(中文) 基於深度學習之老年人受虐檢測系統
論文名稱(英文) Deep Learning Based System for Elderly Abused Detection
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系全英語碩士班
系所名稱(英文) Master's Program, Department of Computer Science and Information Engineering (English-taught program)
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 109
學期 1
出版年 110
研究生(中文) 沈信安
研究生(英文) SAVADOGO Wendgoundi Abdoul Rasmané
學號 607785010
學位類別 碩士
語言別 英文
第二語言別
口試日期 2020-12-11
論文頁數 73頁
口試委員 指導教授 - 陳建彰(cchen34@gmail.com)
共同指導教授 - 洪智傑(smalloshin@gmail.com)
委員 - 林承賢(charchesshen@gmail.com)
委員 - 楊權輝(chyang@hcu.edu.tw)
關鍵字(中) 人類活動識別
虐待老人
3D CNN
深度學習
遷移學習
關鍵字(英) Human Activity Recognition
Elderly abuse
3D CNN
Deep Learning
Transfer Learning
第三語言關鍵字
學科別分類
中文摘要
本文的目的是開發一種基於深度學習的系統,以偵測老年人是否受到看護的虐待。根據世界衛生組織的研究,60歲以上的老人大約有六分之一會受到看護的虐待。根據台灣的一項研究資料,2008到2018的十年間,虐待案件從2271件增加到7745件,同時亦將虐待老人分為7種類型。本研究僅關注偵測是否有老人虐待的發生。

本研究定義身體虐待為故意傷害他人或意圖傷害他人。在之前的研究中,已證明虐待老年人對個人和社會將造成嚴重後果,包括嚴重的人身傷害。為了解決此問題,本研究特別建立了一份只有影片的資料集,其中的影片分為兩組,一份為包含了護理人員對老年人的虐待影片,一份則無虐待情形。影片由特定順序的圖像組成,因此,影片由時空訊息組成。為了有效地擷取這些訊息,良好影像特徵抽取的2D CNN改進成同時捕獲空間和時間元素的3D CNN。在整理出資料集後,我們將資料集經由預訓練的3D CNN模型,可以準確學習影片中的時間軸資訊變化,來偵測是否有虐待情形。該方法可以準確地學習影片的時空元素,以進行人類活動識別。

本研究使用轉移學習的技術來偵測虐待老年人的深度學習方法。首先,使用人類活動識別任務的基準數據集UFC 101來訓練3D CNN神經網路,之後再使用已獲得的模型參數來訓練本研究收集的數據集。本研究除了在所使用的3D CNN架構外,亦使用多種不同的神經網路架構,對本研究收集的數據集進行訓練,實驗結果顯示本研究所進行的3D CNN神經網路具有最佳的表現出色,能夠學習更多的特徵並獲得滿意的結果。
英文摘要
The aim of this thesis is to develop an efficient deep learning-based system to fight against elderly abuse, generally abused by caregivers. According to a study conducted by the World Health organization, approximately 1 in 6 elderly, aged from 60 years old face some form of mistreatment coming from their caregivers. Based on a national study in Taiwan, for a ten (10) years period, abused cases will grow exponentially from 2271 cases in 2008 to 7745 cases in 2018. Many types of elderly abused can be noted but as far as we are concerned, in this work we only focused on how to develop a solution to solve the physical abuse. The physical abuse here can be defined as the utilization of strength intentionally with the purpose to harm or have the ideas to harm someone. In the aforementioned study, it has been shown that elder abuse has serious consequences for individuals and society including serious physical injuries and long-term psychological consequences, increased risk of nursing home placement, use of emergency services, hospitalization and death.
To solve this problem that has not been solve until now, we especially build a dataset only composed of videos divided into two distinct groups: one containing video addressing the abused done on elderly by caregivers and others without the abuse situation. After collecting our dataset, we used a pretrained 3D CNN for our work which shown to be efficient in term of videos spatio-temporal elements learning to efficiently recognize human activity.
Videos can be defined are set of images or frames following a specific order. They are composed of spatial and temporal information. To efficiently capture these elements, 2D CNN known as a good image feature extractor has been improved to 3D CNN where both spatial and temporal elements are captured simultaneously. In that project, we present a deep learning method using concept of transfer learning for elderly abuse. Firstly, a 3D convolutional neural networks is trained on the UFC 101, benchmark dataset for human activity recognition task and then the model obtained from that training is used to train the collected dataset. To show the efficiency of this approach, the collected dataset has been trained on various networks, but our approach is outperforming and satisfying result is achieved. The propose method is able to learn more discriminant features. The propose method is able to learn more discriminant features.
第三語言摘要
論文目次
CONTENTS
中文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Thesis Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Overview of Deep Learning methods on Action Recognition . . . . . . . 6
2.2 Elderly abused detection and Violence detection characteristics . . . . . 10
2.2.1 Machine Learning for Violence Classification . . . . . . . . . . . 11
2.2.2 Deep Learning for Violence Classification . . . . . . . . . . . . . 12
2.3 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1 Human Action Recognition Datasets . . . . . . . . . . . . . . . 14
2.3.2 Violence Detection Datasets . . . . . . . . . . . . . . . . . . . . 15
3 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1 1D Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . 21
3.2 Deep Learning Fundamentals . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.1 Transfer Learning for Deep Learning . . . . . . . . . . . . . . . 22
3.2.2 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.3 Advanced Optimization Methods . . . . . . . . . . . . . . . . . 27
4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Adopted Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.1.1 UCF101 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.1.2 Elderly Abused Dataset . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.3.1 Results on the 3D CNN . . . . . . . . . . . . . . . . . . . . . . 52
5.3.2 Results on the ResNet 3D CNN 34 layers . . . . . . . . . . . . . 54
5.3.3 Results on the R(2+1)D CNN . . . . . . . . . . . . . . . . . . . 56
5.3.4 Results on the pretrained 3D CNN . . . . . . . . . . . . . . . . 61
6 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.1 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . 64
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

LIST OF TABLES
2.1 Human action recognition datasets overview. . . . . . . . . . . . . . . . 15


LIST OF FIGURES
1.1    Indonesian caregiver abusing elderly Taiwanese man.   .  .  .  .  .  .  .  .  .  .3
1.2    Caregiver abusing elderly Taiwanese woman.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .3
1.3    Thesis flowchart.   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .5
2.1    Hockey Fight dataset overview.   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .16
2.2    Movies dataset overview.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .17
2.3    Violent-Flows Crowd Violence dataset overview.  .  .  .  .  .  .  .  .  .  .  .  .  .18
2.4    Real-World Fight dataset overview.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .19
2.5    Surveillance Camera dataset overview.   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .20
3.1    Fine Tuning process.   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .24
3.2    Data augmentation synthetic images adding process.    .  .  .  .  .  .  .  .  .  .26
3.3    Data  augmentation  geometric  transformation  and  different techniques representation.    .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .26
3.4    Some data augmentation techniques.   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .27
4.1    Data preprocessing with clip random sampling representation.   .  .  .  .  .32
4.2    Data preprocessing using the repeated frames technique.   .  .  .  .  .  .  .  .32
4.3    Image treatment process from 2D to 3D and from single image to set of images. .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .34
4.4    Video multiple frames extraction.   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .34
4.5    Proposed system overview.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .34
4.6    Residual network skipping block.   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .37
4.7    ResNet 2 layers and 3 layers block.   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .37
4.8    3D CNN Architecture.   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .38
4.9    R3D CNN architecture. .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .39
4.10  3D ResNet CNN 18 and 34 architecture configurations.  .  .  .  .  .  .  .  .  .39
4.11  3D CNN kernel decomposed into 2D CNN and 1D CNN kernels.  .  .  .  .41
4.12  R(2+1)D CNN architecture representation. .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .41
4.13  R(2+1)D CNN feature extraction process.   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .42
4.14  R(2+1)D CNN (blue graph) vs R3D CNN (green graph) training accu-racy (left) and testing loss (right) on the Violent-Flows dataset.   .  .  .  .42
4.15  R(2+1)D CNN (blue graph) vs R3D CNN (green graph) testing accuracy(left) and training loss (right) on the Violent-Flows dataset.   .  .  .  .  .  .43
4.16  Elderly Abused Detection Flowchart.   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .45
5.1    UCF101 labels complete list. .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .47
5.2    Collected dataset presentation. .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .48
5.3    Elderly abuse dataset trained on 3D CNN network;  training accuracy(left) and loss (right)..   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .53
5.4    Elderly abuse dataset trained on 3D CNN network; validation accuracy(left) and loss (right)..   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .54
5.5    Elderly  abuse  dataset  trained  on  3D  CNN  network;  testing  accuracy(left) and loss (right)..   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .54
5.6    Elderly abuse dataset trained on R3D CNN network; training accuracy(left) and loss (right). .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .55
5.7    Elderly abuse dataset trained on R3D CNN network; validation accuracy(left) and loss (right). .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .56
5.8    Figure/Elderly  abuse  dataset  trained  on  R3D  CNN  network;  testing accuracy (left) and loss (right).   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .56
5.9    Elderly abuse dataset trained on R(2+1)D CNN network; training accuracy (left) and loss (right). .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .58
5.10  Elderly abused dataset trained on R(2+1)D CNN network;  validation accuracy (left) and loss (right).   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .58
5.11  Elderly abused dataset trained on R(2+1)D CNN network; Testing accuracy (left) and loss (right). .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .59
5.12  Elderly  abused  dataset  trained  on  pretrained  R(2+1)D  CNN  network and repeated frames extraction preprocessing method; training accuracy(left) and loss (right). .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .59
5.13  Elderly  abused  dataset  trained  on  pretrained  R(2+1)D  CNN  network and repeated frames extraction preprocessing method; validation accuracy (left) and loss (right).  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .60
5.14  Elderly  abused  dataset  trained  on  pretrained  R(2+1)D  CNN  network and repeated frames extraction preprocessing method; testing accuracy(left) and loss (right). .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .60
5.15  The summary of the different training results.   .  .  .  .  .  .  .  .  .  .  .  .  .  .62
5.16  Elderly abuse dataset trained on pretrained 3D CNN network; training accuracy (left) and loss (right).   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .62
5.17  Elderly abuse dataset trained on pretrained 3D CNN network; validation accuracy (left) and loss (right).   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .63
5.18  Elderly abuse dataset trained on pretrained 3D CNN network;  testing accuracy (left) and loss (right).   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .63
參考文獻
References
[1] S. Yang, (2018) “Indonesian caregiver abusing elderly Taiwanese man caught on
camera”:
[2] T. N. S. writer, (2020) “Caregiver caught on video allegedly abusing elderly
woman in New Taipei, Taiwan News”:
[3] H. Wang and C. Schmid. “Action recognition with improved trajectories”. In:
Proceedings of the IEEE international conference on computer vision. 2013, 3551–
3558.
[4] T. Zhang, W. Jia, B. Yang, J. Yang, X. He, and Z. Zheng, (2017) “MoWLD: a
robust motion image descriptor for violence detection” Multimedia Tools and
Applications 76(1): 1419–1438.
[5] C. Feichtenhofer, A. Pinz, and A. Zisserman. “Convolutional two-stream network
fusion for video action recognition”. In: Proceedings of the IEEE conference on
computer vision and pattern recognition. 2016, 1933–1941.
[6] L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. Van Gool. “Temporal
segment networks: Towards good practices for deep action recognition”.
In: European conference on computer vision. Springer. 2016, 20–36.
[7] Z. Lan, Y. Zhu, A. G. Hauptmann, and S. Newsam. “Deep local video feature for
action recognition”. In: Proceedings of the IEEE conference on computer vision
and pattern recognition workshops. 2017, 1–7.
[8] B. Zhou, A. Andonian, A. Oliva, and A. Torralba. “Temporal relational reasoning
in videos”. In: Proceedings of the European Conference on Computer Vision
(ECCV). 2018, 803–818.
[9] K. Simonyan and A. Zisserman, (2014) “Two-stream convolutional networks for
action recognition in videos” Advances in neural information processing
systems 27: 568–576.
[10] J. Carreira and A. Zisserman. “Quo vadis, action recognition? a new model
and the kinetics dataset”. In: proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition. 2017, 6299–6308.
[11] A. Diba, M. Fayyaz, V. Sharma, A. Hossein Karami, M. Mahdi Arzani, R.
Yousefzadeh, and L. Van Gool. “Temporal 3d convnets using temporal transition
layer”. In: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition Workshops. 2018, 1117–1121.
[12] W. Zhu, C. Lan, J. Xing, W. Zeng, Y. Li, L. Shen, and X. Xie. “Co-occurrence feature
learning for skeleton based action recognition using regularized deep LSTM
networks”. In: Proceedings of the AAAI Conference on Artificial Intelligence. 30.
1. 2016.
[13] K. Hara, H. Kataoka, and Y. Satoh. “Can spatiotemporal 3d cnns retrace the
history of 2d cnns and imagenet?” In: Proceedings of the IEEE conference on
Computer Vision and Pattern Recognition. 2018, 6546–6555.
[14] Z. Qiu, T. Yao, and T. Mei. “Learning spatio-temporal representation with
pseudo-3d residual networks”. In: proceedings of the IEEE International Con-
ference on Computer Vision. 2017, 5533–5541.
[15] R. Mutegeki and D. S. Han. “A CNN-LSTMApproach to Human Activity Recognition”.
In: 2020 International Conference on Artificial Intelligence in Informa-
tion and Communication (ICAIIC). IEEE. 2020, 362–366.
[16] K. Xia, J. Huang, and H. Wang, (2020) “LSTM-CNN Architecture for Human
Activity Recognition” IEEE Access 8: 56855–56866.
[17] J. Yue-Hei Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and
G. Toderici. “Beyond short snippets: Deep networks for video classification”. In:
Proceedings of the IEEE conference on computer vision and pattern recognition.
2015, 4694–4702.
[18] J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan,
K. Saenko, and T. Darrell. “Long-term recurrent convolutional networks for visual
recognition and description”. In: Proceedings of the IEEE conference on
computer vision and pattern recognition. 2015, 2625–2634.
[19] H. Zhou, X.-Q. Lv, X.-D. You, Z.-A. Dong, and K. Zhang, (2019) “FOF: Fusing
Object Features into Deep Learning Model to Generate Image Caption” Journal
of Computers 30(4): 206–216.
[20] F. J. Ord´o˜nez and D. Roggen, (2016) “Deep convolutional and lstm recurrent
neural networks for multimodal wearable activity recognition” Sensors 16(1):
115.
[21] S. Song, C. Lan, J. Xing, W. Zeng, and J. Liu, (2018) “Spatio-temporal attention-
based LSTM networks for 3D action recognition and detection” IEEE Trans-
actions on image processing 27(7): 3459–3471.
[22] D. Bahdanau, K. Cho, and Y. Bengio, (2014) “Neural machine translation by
jointly learning to align and translate” arXiv preprint arXiv:1409.0473:
[23] I. Sutskever, O. Vinyals, and Q. V. Le, (2014) “Sequence to sequence learning
with neural networks” Advances in neural information processing systems
27: 3104–3112.
[24] K. Cho, B. VanMerri¨enboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk,
and Y. Bengio, (2014) “Learning phrase representations using RNN encoder-
decoder for statistical machine translation” arXiv preprint arXiv:1406.1078:
[25] A. M. Rush, S. Chopra, and J. Weston, (2015) “A neural attention model for
abstractive sentence summarization” arXiv preprint arXiv:1509.00685:
[26] Y. Lin, J. Le Kernec, S. Yang, F. Fioranelli, O. Romain, and Z. Zhao, (2018)
“Human activity classification with radar: Optimization and noise robustness
with iterative convolutional neural networks followed with random forests” IEEE
Sensors Journal 18(23): 9669–9681.
[27] P. Agarwal and M. Alam, (2020) “A lightweight deep learning model for human
activity recognition on edge devices” Procedia Computer Science 167: 2364–
2373.
[28] M. Ramzan, A. Abid, H. U. Khan, S. M. Awan, A. Ismail, M. Ahmed, M. Ilyas,
and A. Mahmood, (2019) “A review on state-of-the-art violence detection tech-
niques” IEEE Access 7: 107560–107575.
[29] L. Nayak. “Audio-Visual Content-Based Violent Scene Characterisation”. (phdthesis).
2015.
[30] L.-H. Chen, H.-W. Hsu, L.-Y. Wang, and C.-W. Su. “Violence detection in
movies”. In: 2011 Eighth International Conference Computer Graphics, Imag-
ing and Visualization. IEEE. 2011, 119–124.
[31] I. Laptev, (2005) “On space-time interest points” International journal of
computer vision 64(2-3): 107–123.
[32] M.-y. Chen and A. Hauptmann, (2009) “Mosift: Recognizing human actions in
surveillance videos”:
[33] L. Xu, C. Gong, J. Yang, Q. Wu, and L. Yao. “Violent video detection based on
MoSIFT feature and sparse coding”. In: 2014 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2014, 3538–3542.
[34] A. B. Mabrouk and E. Zagrouba, (2017) “Spatio-temporal feature using optical
flow based distribution for violence detection” Pattern Recognition Letters
92: 62–67.
[35] T. Zhang, Z. Yang, W. Jia, B. Yang, J. Yang, and X. He, (2016) “A new method
for violence detection in surveillance scenes” Multimedia Tools and Appli-
cations 75(12): 7327–7349.
[36] C. Ding, S. Fan, M. Zhu, W. Feng, and B. Jia. “Violence detection in video by
using 3D convolutional neural networks”. In: International Symposium on Visual
Computing. Springer. 2014, 551–558.
[37] A. Ullah, J. Ahmad, K. Muhammad, M. Sajjad, and S. W. Baik, (2017) “Ac-
tion recognition in video sequences using deep bi-directional LSTM with CNN
features” IEEE Access 6: 1155–1166.
[38] S. Accattoli, P. Sernani, N. Falcionelli, D. N. Mekuria, and A. F. Dragoni, (2020)
“Violence Detection in Videos by Combining 3D Convolutional Neural Networks
and Support Vector Machines” Applied Artificial Intelligence 34(4): 329–
344.
[39] J. Li, X. Jiang, T. Sun, and K. Xu. “Efficient violence detection using 3d convolutional
neural networks”. In: 2019 16th IEEE International Conference on
Advanced Video and Signal Based Surveillance (AVSS). IEEE. 2019, 1–8.
[40] S¸. Aktı, G. A. Tataro˘glu, and H. K. Ekenel. “Vision-based Fight Detection from
Surveillance Cameras”. In: 2019 Ninth International Conference on Image Pro-
cessing Theory, Tools and Applications (IPTA). IEEE. 2019, 1–6.
[41] W. Song, D. Zhang, X. Zhao, J. Yu, R. Zheng, and A.Wang, (2019) “A novel vio-
lent video detection scheme based on modified 3D convolutional neural networks”
IEEE Access 7: 39172–39179.
[42] S. A. Carneiro, G. P. da Silva, S. J. F. Guimaraes, and H. Pedrini. “Fight Detection
in Video Sequences Based on Multi-Stream Convolutional Neural Networks”.
In: 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images
(SIBGRAPI). IEEE. 2019, 8–15.
[43] M. Cheng, K. Cai, and M. Li, (2019) “RWF-2000: An Open Large Scale Video
Database for Violence Detection” arXiv preprint arXiv:1911.05913:
[44] A. Mumtaz, A. B. Sargano, and Z. Habib. “Violence Detection in Surveillance
Videos with Deep Network Using Transfer Learning”. In: 2018 2nd European
Conference on Electrical Engineering and Computer Science (EECS). IEEE.
2018, 558–563.
[45] E. B. Nievas, O. D. Suarez, G. B. Garcıa, and R. Sukthankar. “Violence detection
in video using computer vision techniques”. In: International conference on
Computer analysis of images and patterns. Springer. 2011, 332–339.
[46] T. Hassner, Y. Itcher, and O. Kliper-Gross. “Violent flows: Real-time detection
of violent crowd behavior”. In: 2012 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition Workshops. IEEE. 2012, 1–6.
論文全文使用權限
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
同意授權
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信