§ 瀏覽學位論文書目資料
  
系統識別號 U0002-2809202023362200
DOI 10.6846/TKU.2020.00834
論文名稱(中文) 基於深度學習BERT語言模型之醫療影像報告生成系統
論文名稱(英文) Biomedical Image Report Generation System Based on Deep Learning BERT Language Model
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系碩士班
系所名稱(英文) Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 108
學期 2
出版年 109
研究生(中文) 簡孝羽
研究生(英文) Siao-Yu Jian
學號 607410619
學位類別 碩士
語言別 繁體中文
第二語言別 英文
口試日期 2020-07-14
論文頁數 77頁
口試委員 指導教授 - 洪文斌(horng@mail.tku.edu.tw)
委員 - 彭建文(pchw8598@mail.chihlee.edu.tw)
委員 - 范俊海(chunhai@mail.tku.edu.tw)
關鍵字(中) BERT
影像描述
Conditional Layer Normalization
關鍵字(英) BERT
Image Captioning
Conditional Layer Normalization
第三語言關鍵字
學科別分類
中文摘要
本研究使用IU X-Ray以及PEIR Gross醫療影像資料庫,利用影像描述 (Image Captioning) 技術,建立了醫療影像報告生成系統,其中使用卷積神經網路架構VGG以及ResNet做為影像特徵擷取模型,並使用Conditional Layer Normalization來控制自然語言模型BERT (Bidirectional Encoder Representations from Transformers) 將影像特徵做為條件控制文字的生成,最後使用集束搜尋 (Beam Search) 來增加輸出文字的準確度。本研究使用了BLEU以及Rouge-L兩個自然語言評測指標,研究結果顯示使用BERT結合VGG以及ResNet,其表現皆優於其他使用循環神經網路的模型,證明了BERT在影像描述中比起循環神經網路能有更優異的表現。
英文摘要
In this study, we use IU X-Ray and PIER Gross medical image dataset, combine Image Captioning technology, established a medical image report generation system. We use convolutional neural network architecture VGG and ResNet as the image feature extraction model, and use conditional layer normalization to control the natural language model BERT (Bidirectional Encoder Representations from Transformers) to achieve conditional text generation with image features. Finally, we used beam search to increase the accuracy of the output text. We use BLEU and Rouge-L to evaluating the quality of text. The results show that the use of BERT combined with VGG and ResNet performs better than other image captioning models with recurrent neural networks.
第三語言摘要
論文目次
目錄
中文摘要	I
英文摘要	II
目錄	III
圖目錄	VI
表目錄	VIII
第一章 緒論	1
1.1研究背景與動機	1
1.2研究動機與目的	2
1.3論文架構	3
第二章 文獻探討	4
2.1 醫療影像報告	4
2.2自然語言處理	4
2.3 TRANSFORMER	8
2.4影像描述	12
2.5醫療影像描述相關研究	20
2.6 GOOGLE的雙向編碼器–BERT	24
2.6.1遮罩語言模型	26
2.6.2 Next Sentence Prediction	27
2.7 VGG網路架構	28
2.8殘差網路RESNET	28
第三章 醫療影像生成報告系統	31
3.1 系統流程架構	31
3.2.1 IU X-Ray資料庫	35
3.2.2 PEIR Gross 資料庫	37
3.2.3 影像正規化	38
3.2.4 文字正規化	39
3.3 CONDITIONAL LAYER NORMALIZATION	40
3.4 BEAM SEARCH	41
3.5 BERT的FINE-TUNING	41
第四章 實驗結果	43
4.1 實驗設備與樣本環境	43
4.2 實驗評比指標	44
4.2.1 BLEU	44
4.2.2 Rouge-L	45
4.3 實驗的對照組	46
4.4 IU X-RAY實驗結果呈現	50
4.5 PEIR GROSS實驗結果呈現	52
第五章 結論與未來展望	53
參考文獻	54
附錄一 英文論文	61


 
圖目錄
圖 1 循環神經網路架構	5
圖 2 長短期記憶模型架構	6
圖 3  GATED RECURRENT NETWORKS	7
圖 4  SCALED DOT-PRODUCT ATTENTION及MUTI-HEAD ATTENTION	9
圖 5  TRANSFORMER架構	11
圖 6 基於文法規則所產生的影像描述	12
圖 7 基於R-CNN的影像描述架構	14
圖 8 軟注意力機制與硬注意力機制	14
圖 9 結合VGG與RPN所得到的影像特徵來產生影像描述	15
圖 10 利用SENTINEL GATE 來控制影像的注意力權重	16
圖 11 利用複製機制來讓輸出的文字更多樣性	17
圖 12 使用RPN以及卷積神經網路提取影像以及文字特徵	18
圖 13 使用MERGE-GATE 結合文字與影像的特徵來產生描述	18
圖 14 利用新聞文章以及影像產生新聞標題	19
圖 15  MSCAP的模型架構	19
圖 16 使用ENCODER-DECODER架構來產生醫療標籤	20
圖 17  MDNET醫療影像描述模型	21
圖 18 使用傳統注意力機制與AAS機制之差別	21
圖 19  HRGR AGENT 模型架構	22
圖 20  CO-ATTENTION及分層長短期記憶模型	23
圖 21  BERT可應用於各種自然語言任務	25
圖 22 遮罩語言模型的訓練過程	27
圖 23 神經網路架構的梯度消失問題	29
圖 24 殘差網路的恆等對映	29
圖 25  RESNET使用IMAGENET的實驗結果	30
圖 26 系統架構流程圖	32
圖 27  BERT與自回歸解碼器	33
圖 28  RESNET152架構以及殘差模塊	34
圖 29  VGG-19網路架構	35
圖 30  IU X-RAY資料庫的正面以及側面胸腔X光片	37

 
表目錄
表 1  GOOGLE目前釋出的BERT預訓練模型	26
表 2  GOOGLE釋出的BERT-LARGE預訓練模型	26
表 3  IU X-RAY FINDINGS & IMPRESSION 欄位	37
表 4  PEIR GROSS醫療影像以及描述	38
表 5 實驗的訓練參數	42
表 6 實驗設備與設置的軟體環境	43
表 7 實驗的總訓練時間	43
表 8 實驗的對照模型	47
表 9  IU X-RAY 的實驗對照結果	48
表 10  PEIR GROSS 的實驗對照結果	49
表 11  IU-XAY實驗結果(一)	50
表 12  IU-XAY實驗結果(二)	50
表 13  IU-XAY實驗結果(三)	51
表 14  IU-XAY實驗結果(四)	51
表 15  PEIR GROSS 實驗結果(一)	52
表 16  PEIR GROSS 實驗結果(二)	52
表 17  PEIR GROSS 實驗結果(三)	52
參考文獻
[1]	P. Kisilev., E. Walach., E. Barkan., B. Ophir., S. Alpert and S. Y. Hashoul., “From Medical Image to Automatic Medical Report Generation,” IBM Journal of Research and Development, vol.29, pp.2-1-2-7, 2015.

[2]	C. Cortes and V. Vapnik., “Support-Vector Network,” Machine Learning, pp.273-297, 1995.

[3]	H. C. Shin., K. Roberts., L. Lu., D. Demner-Fushman., J. Yao and R. M Summers., “Learning to Read Chest X-Rays: Recurrent Neural Cascade Model for Automated Image Annotation,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2497-2506, 2016.

[4]	I. Sutskever., O Vinyals., And Q. V. Le., “Sequence to Sequence Learning with Neural Networks,” Advances in Neural Information Processing Systems 27 , 2014.

[5]	Z. Zhang., Y. Xie., F. Xing., M. McGough and L. Yang., “MDNet: A Semantically and Visually Interpretable Medical Image Diagnosis Network.” IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp.6428-6436, 2017.

[6]	O. Vinyals., A. Toshev., S. Bengio and D. Erhan., “Show and Tell: A Neural Image Caption Generator,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3156-3164, 2015.

[7]	A. Karpathy. and L. Fei-Fei., “Deep Visualsemantic Alignments for Generating Image Descriptions,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3128-3137, 2015.

[8]	Q. You., H. Jin., Z. Wang., C. Fang and J. Luo., “Image Captioning with Semantic Attention,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4651-4659, 2016.


[9]	J. Krause., J. Johnson., R. Krishna and L. Fei-Fei., “A Hierarchical Approach for Generating Descriptive Image Paragraphs,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3337-3345, 2017.

[10]	J. Cheng., L. Dong and M. Lapata., “Long Short-Term Memory-Networks for Machine Reading” Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp.551-561, 2016.

[11]	M. Hüsken and P. Stagge., “Recurrent Neural Networks for Time Series Classification,” Neurocomputing, vol.50, pp.223-235, 2003.

[12]	A. Vaswani., N. Shazeer., N. Parmar., J. Uszkoreit., L. Jones., A. N. Gomez., L. Kaiser and I. Polosukhin. “Attention Is All You Need,” Advances in Neural Information Processing Systems, 2017.

[13]	Z. Yang., Z. Dai., Y. Yang., J. Carbonell., R. Salakhutdinov and Q. V. Le.,  “XLNet: Generalized Autoregressive Pretraining for Language Understanding,” 33rd Conference on Neural Information Processing Systems, 2019.

[14]	L. Dong., N. Yang., W. Wang., F. Wei., X. Liu., Y. Wang., J. Gao., M. Zhou and H. Hon., “Unified Language Model Pre-training for Natural Language Understanding and Generation,” 33rd Conference on Neural Information Processing Systems, 2019.

[15]	J. Devlin., M. Chang., K. Lee., K. Toutanova., “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” 33rd Conference on Neural Information Processing Systems, 2019.

[16]	K. Simonyan., A. Zisserman., “Very Deep Convolutional Networks for Large-Scale Image Recognition,” 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), June, 2015.

[17]	A. Krizhevsky., l. Sutskever. and G. E. Hinton., “ImageNet Classification With Deep Convolutional Neural Networks,” Proceedings of the 25th International Conference on Neural Information Processing Systems vol.1, pp.1097-1105, December, 2012.

[18]	K. He., X. Zhang., S. Ren. and J. Sun., “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.770-778, December, 2016.

[19]	V. Kougia., J. Pavlopoulos. and I. Androutsopoulos., “A Survey on Biomedical Image Captioning,” Proceedings of the Second Workshop on Shortcomings in Vision and Language, pp.26-30, June, 2019.

[20]	S. Ioffe. and C. Szegedy., “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” Proceedings of the 32nd International Conference on Machine Learning, vol.37, pp.448-456, July, 2015.

[21]	J. L. Ba., J. R. Kiros. and G. E. Hinton., “Layer Normalization,” arXiv:1607.06450, 2016.

[22]	H. d. Vries., F. Strub., J. Mary., H. Larochelle., O. Pietquin. and A. Courville., “Modulating Early Visual Processing by Language,” Advances in Neural Information Processing Systems 30, 2017.

[23]	I. J. Goodfellow., J. Pouget-Abadie., M. Mirza., B. Xu., D. Warde-Farley., S. Ozair., A. Courville. and Y. Bengio., “Generative Adversarial Networks,” Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol.2, pp.2672–2680, December 2014.

[24]	D. P. Kingma. and J. Ba., “Adam: A Method for Stochastic Optimization,” the 3rd International Conference for Learning Representations, San Diego, 2015.

[25]	K. Papineni., S. Roukos., T. Ward. and W. Zhu., “Bleu: A Method for Automatic Evaluation of Machine Translation,” Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp.311-318, July, 2002.

[26]	C. Lin. “ROUGE: A Package for Automatic Evaluation of Summaries,” Text Summarization Branches Out, pp.74-81, July, 2004. 

[27]	J. Donahue., L. A. Hendricks., M. Rohrbach., S. Venugopalan., S. Guadarrama., K. Saenko. and T. Darrell., “Long-Term Recurrent Convolutional Networks for Visual Recognition and Description,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp.2625-2634, 2015.

[28]	J. Laaksonen and E. Oja., “Classification with Learning K-Nearest Neighbors,” Proceedings of International Conference on Neural Networks, 1996.

[29]	B. Jing., P. Xie. and E. Xing., “On the Automatic Generation of Medical Imaging Reports,” Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.2577-2586, July, 2018.

[30]	C. Y. Li., X. Liang., Z. Hu. and E. P. Xing., “Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation,” Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp.1537-1547, December, 2018.

[31]	J. Lee., W. Yoon., S. Kim., D. Kim., S. Kim., C. H. So. and J. Kang., “BioBERT: A Pre-Trained Biomedical Language Representation Model for Biomedical Text Mining,” Bioinformatics, vol.36, Issue.4, pp.1234–1240, Febuary, 2020.

[32]	I. Sutskever., J. Martens. and G. E. Hinton., “ Generating Text With Recurrent Neural Networks,” Proceedings of the 28th international conference on machine learning (ICML-11), pp.1017–1024, 2011.

[33]	A. Graves., A. Mohamed. and G. E. Hinton., “ Speech Recognition With Deep Recurrent Neural Networks,” In 2013 IEEE international conference on acoustics, speech and signal processing, pp.6645–6649, 2013.

[34]	D. Bahdanau., K. Cho. and Y. Bengio., “ Neural Machine Translation by Jointly Learning to Align and Translate,” arXiv:1409.0473, 2014.



[35]	K. Choi., G. Fazekas., M. Sandler. and K. Cho., “Convolutional Recurrent Neural Networks for Music Classification,” 2017 IEEE International Conference on Acoustics Speech and Signal Processing, pp.2392–2396, 2017.

[36]	F. A. Gers., J. Schmidhuber. and F. Cummins., “Learning to Forget: Continual Prediction with Lstm.” 9th International Conference on Artificial Neural Networks , Institution of Engineering and Technology, vol.5, pp. 850–855, 1999.

[37]	S. Li., G. Kulkarni., T. L. Berg., A.C. Berg. and Y. Choi., “Composing Simple Image Descriptions Using Web-Scalen-Grams,” Proceedings of the Fifteenth Conference on Computational Natural Language Learning, Association for Computational Linguistics, pp.220–228, 2011.

[38]	K. Xu., J. Ba., R. Kiros., K. Cho., A. Courville., R. Salakhutdinov., R. Zemel. and Y. Bengio., “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention,” Computer Science, pp.2048-2057, 2015.

[39]	Q. Wu., C. Shen., L. Liu., A. R. Dick. and A. Hengel., “What Value Does Explicit High Level Concepts Have in Vision to Language Problems,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.203–212, 2016.

[40]	J. Lu., C. Xiong., D. Parikh. and R. Socher., “Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3242-3250, 2017.

[41]	N. Li. and Z. Chen., “Image Captioning with Visual-Semantic LSTM,” IJCAI, 2018.

[42]	F. Liu., X. Ren., Y. Liu., H. Wang. and X. Sun., “SimNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions,” EMNLP, 2018.


[43]	T. Yao., Y. Pan., Y. Li., T. Mei., “Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.5263-5271, 2017.

[44]	A. F. Biten., L. Gomez., M. Rusiñol. and D. Karatzas., “Good News, Everyone! Context Driven Entity-Aware Captioning for News Images,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.12458-12467, 2019. 

[45]	J. Pennington., R. Socher. and C. Manning., “Glove: Global Vectors for Word Representation,” Conference on Empirical Methods in Natural Language Processing, pp.1532–1543, 2014.

[46]	L. Guo., J. Liu., P. Yao., J. Li., H. Lu., “MSCap: Multi-Style Image Captioning with Unpaired Stylized Text,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4199-4208, 2019.

[47]	C. Szegedy., W. Liu., Y. Jia., P. Sermanet., S. Reed., D. Anguelov., D. Erhan., V. Vanhoucke. and A.Rabinovich., “Going Deeper with Convolutions,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1-9, 2015.

[48]	S. Wiseman. and A. M. Rush., “Sequence-to-Sequence Learning as Beam-Search Optimization,” arXiv:1606.02960, 2016.

[49]	J. Chung., C. Gulcehre., K. Cho. and Y. Bengio., “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling,” arXiv:1412.3555, 2014.

[50]	Github: Book Corpus. 
https://github.com/soskek/bookcorpus 
(2020/07/07 visited.)

[51]	Github: Google BERT. 
https://github.com/google-research/bert 
(2020/07/07 visited.)


[52]	Github: A Survey on Biomedical Image Captioning.
https://github.com/nlpaueb/bio_image_caption/tree/master/SiVL19
(2020/07/08 visited.)

[53]	Github: bert4keras.
https://github.com/bojone/bert4keras 
(2020/07/08 visited.)

[54]	Large Scale Visual Recognition Challenge (ILSVRC).
http://www.image-net.org/challenges/LSVRC/ 
(2020/07/08 visited.)

[55]	Microsoft Common Objects in Content.
https://cocodataset.org/#detection-leaderboard 
(2020/07/08 visited.)

[56]	Keras: Using Pre-Trained Models.
https://keras.rstudio.com/articles/applications.html 
(2020/07/08 visited.)

[57]	Open-i: Chest X-ray Images From the Indiana University. 
https://openi.nlm.nih.gov/faq#collection 
(2020/07/08 visited.)

[58]	Kaggle: Chest X-Ray (Indiana University).
https://www.kaggle.com/raddar/chest-xrays-indiana-university
(2020/07/08 visited.)

[59]	PEIR Digital Library.
https://peir.path.uab.edu/library/ 
(2020/07/08 visited.)

[60]	Understanding LSTM Networks.
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
(2020/09/20 visited.)

[61]	基於 Conditional Layer Normalizaiotn 的條件文本生成
https://kexue.fm/archives/7124
(2020/07/08 visited.)
論文全文使用權限
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
同意授權予資料庫廠商
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信