系統識別號 | U0002-2809202023362200 |
---|---|
DOI | 10.6846/TKU.2020.00834 |
論文名稱(中文) | 基於深度學習BERT語言模型之醫療影像報告生成系統 |
論文名稱(英文) | Biomedical Image Report Generation System Based on Deep Learning BERT Language Model |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 資訊工程學系碩士班 |
系所名稱(英文) | Department of Computer Science and Information Engineering |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 108 |
學期 | 2 |
出版年 | 109 |
研究生(中文) | 簡孝羽 |
研究生(英文) | Siao-Yu Jian |
學號 | 607410619 |
學位類別 | 碩士 |
語言別 | 繁體中文 |
第二語言別 | 英文 |
口試日期 | 2020-07-14 |
論文頁數 | 77頁 |
口試委員 |
指導教授
-
洪文斌(horng@mail.tku.edu.tw)
委員 - 彭建文(pchw8598@mail.chihlee.edu.tw) 委員 - 范俊海(chunhai@mail.tku.edu.tw) |
關鍵字(中) |
BERT 影像描述 Conditional Layer Normalization |
關鍵字(英) |
BERT Image Captioning Conditional Layer Normalization |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
本研究使用IU X-Ray以及PEIR Gross醫療影像資料庫,利用影像描述 (Image Captioning) 技術,建立了醫療影像報告生成系統,其中使用卷積神經網路架構VGG以及ResNet做為影像特徵擷取模型,並使用Conditional Layer Normalization來控制自然語言模型BERT (Bidirectional Encoder Representations from Transformers) 將影像特徵做為條件控制文字的生成,最後使用集束搜尋 (Beam Search) 來增加輸出文字的準確度。本研究使用了BLEU以及Rouge-L兩個自然語言評測指標,研究結果顯示使用BERT結合VGG以及ResNet,其表現皆優於其他使用循環神經網路的模型,證明了BERT在影像描述中比起循環神經網路能有更優異的表現。 |
英文摘要 |
In this study, we use IU X-Ray and PIER Gross medical image dataset, combine Image Captioning technology, established a medical image report generation system. We use convolutional neural network architecture VGG and ResNet as the image feature extraction model, and use conditional layer normalization to control the natural language model BERT (Bidirectional Encoder Representations from Transformers) to achieve conditional text generation with image features. Finally, we used beam search to increase the accuracy of the output text. We use BLEU and Rouge-L to evaluating the quality of text. The results show that the use of BERT combined with VGG and ResNet performs better than other image captioning models with recurrent neural networks. |
第三語言摘要 | |
論文目次 |
目錄 中文摘要 I 英文摘要 II 目錄 III 圖目錄 VI 表目錄 VIII 第一章 緒論 1 1.1研究背景與動機 1 1.2研究動機與目的 2 1.3論文架構 3 第二章 文獻探討 4 2.1 醫療影像報告 4 2.2自然語言處理 4 2.3 TRANSFORMER 8 2.4影像描述 12 2.5醫療影像描述相關研究 20 2.6 GOOGLE的雙向編碼器–BERT 24 2.6.1遮罩語言模型 26 2.6.2 Next Sentence Prediction 27 2.7 VGG網路架構 28 2.8殘差網路RESNET 28 第三章 醫療影像生成報告系統 31 3.1 系統流程架構 31 3.2.1 IU X-Ray資料庫 35 3.2.2 PEIR Gross 資料庫 37 3.2.3 影像正規化 38 3.2.4 文字正規化 39 3.3 CONDITIONAL LAYER NORMALIZATION 40 3.4 BEAM SEARCH 41 3.5 BERT的FINE-TUNING 41 第四章 實驗結果 43 4.1 實驗設備與樣本環境 43 4.2 實驗評比指標 44 4.2.1 BLEU 44 4.2.2 Rouge-L 45 4.3 實驗的對照組 46 4.4 IU X-RAY實驗結果呈現 50 4.5 PEIR GROSS實驗結果呈現 52 第五章 結論與未來展望 53 參考文獻 54 附錄一 英文論文 61 圖目錄 圖 1 循環神經網路架構 5 圖 2 長短期記憶模型架構 6 圖 3 GATED RECURRENT NETWORKS 7 圖 4 SCALED DOT-PRODUCT ATTENTION及MUTI-HEAD ATTENTION 9 圖 5 TRANSFORMER架構 11 圖 6 基於文法規則所產生的影像描述 12 圖 7 基於R-CNN的影像描述架構 14 圖 8 軟注意力機制與硬注意力機制 14 圖 9 結合VGG與RPN所得到的影像特徵來產生影像描述 15 圖 10 利用SENTINEL GATE 來控制影像的注意力權重 16 圖 11 利用複製機制來讓輸出的文字更多樣性 17 圖 12 使用RPN以及卷積神經網路提取影像以及文字特徵 18 圖 13 使用MERGE-GATE 結合文字與影像的特徵來產生描述 18 圖 14 利用新聞文章以及影像產生新聞標題 19 圖 15 MSCAP的模型架構 19 圖 16 使用ENCODER-DECODER架構來產生醫療標籤 20 圖 17 MDNET醫療影像描述模型 21 圖 18 使用傳統注意力機制與AAS機制之差別 21 圖 19 HRGR AGENT 模型架構 22 圖 20 CO-ATTENTION及分層長短期記憶模型 23 圖 21 BERT可應用於各種自然語言任務 25 圖 22 遮罩語言模型的訓練過程 27 圖 23 神經網路架構的梯度消失問題 29 圖 24 殘差網路的恆等對映 29 圖 25 RESNET使用IMAGENET的實驗結果 30 圖 26 系統架構流程圖 32 圖 27 BERT與自回歸解碼器 33 圖 28 RESNET152架構以及殘差模塊 34 圖 29 VGG-19網路架構 35 圖 30 IU X-RAY資料庫的正面以及側面胸腔X光片 37 表目錄 表 1 GOOGLE目前釋出的BERT預訓練模型 26 表 2 GOOGLE釋出的BERT-LARGE預訓練模型 26 表 3 IU X-RAY FINDINGS & IMPRESSION 欄位 37 表 4 PEIR GROSS醫療影像以及描述 38 表 5 實驗的訓練參數 42 表 6 實驗設備與設置的軟體環境 43 表 7 實驗的總訓練時間 43 表 8 實驗的對照模型 47 表 9 IU X-RAY 的實驗對照結果 48 表 10 PEIR GROSS 的實驗對照結果 49 表 11 IU-XAY實驗結果(一) 50 表 12 IU-XAY實驗結果(二) 50 表 13 IU-XAY實驗結果(三) 51 表 14 IU-XAY實驗結果(四) 51 表 15 PEIR GROSS 實驗結果(一) 52 表 16 PEIR GROSS 實驗結果(二) 52 表 17 PEIR GROSS 實驗結果(三) 52 |
參考文獻 |
[1] P. Kisilev., E. Walach., E. Barkan., B. Ophir., S. Alpert and S. Y. Hashoul., “From Medical Image to Automatic Medical Report Generation,” IBM Journal of Research and Development, vol.29, pp.2-1-2-7, 2015. [2] C. Cortes and V. Vapnik., “Support-Vector Network,” Machine Learning, pp.273-297, 1995. [3] H. C. Shin., K. Roberts., L. Lu., D. Demner-Fushman., J. Yao and R. M Summers., “Learning to Read Chest X-Rays: Recurrent Neural Cascade Model for Automated Image Annotation,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2497-2506, 2016. [4] I. Sutskever., O Vinyals., And Q. V. Le., “Sequence to Sequence Learning with Neural Networks,” Advances in Neural Information Processing Systems 27 , 2014. [5] Z. Zhang., Y. Xie., F. Xing., M. McGough and L. Yang., “MDNet: A Semantically and Visually Interpretable Medical Image Diagnosis Network.” IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp.6428-6436, 2017. [6] O. Vinyals., A. Toshev., S. Bengio and D. Erhan., “Show and Tell: A Neural Image Caption Generator,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3156-3164, 2015. [7] A. Karpathy. and L. Fei-Fei., “Deep Visualsemantic Alignments for Generating Image Descriptions,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3128-3137, 2015. [8] Q. You., H. Jin., Z. Wang., C. Fang and J. Luo., “Image Captioning with Semantic Attention,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4651-4659, 2016. [9] J. Krause., J. Johnson., R. Krishna and L. Fei-Fei., “A Hierarchical Approach for Generating Descriptive Image Paragraphs,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3337-3345, 2017. [10] J. Cheng., L. Dong and M. Lapata., “Long Short-Term Memory-Networks for Machine Reading” Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp.551-561, 2016. [11] M. Hüsken and P. Stagge., “Recurrent Neural Networks for Time Series Classification,” Neurocomputing, vol.50, pp.223-235, 2003. [12] A. Vaswani., N. Shazeer., N. Parmar., J. Uszkoreit., L. Jones., A. N. Gomez., L. Kaiser and I. Polosukhin. “Attention Is All You Need,” Advances in Neural Information Processing Systems, 2017. [13] Z. Yang., Z. Dai., Y. Yang., J. Carbonell., R. Salakhutdinov and Q. V. Le., “XLNet: Generalized Autoregressive Pretraining for Language Understanding,” 33rd Conference on Neural Information Processing Systems, 2019. [14] L. Dong., N. Yang., W. Wang., F. Wei., X. Liu., Y. Wang., J. Gao., M. Zhou and H. Hon., “Unified Language Model Pre-training for Natural Language Understanding and Generation,” 33rd Conference on Neural Information Processing Systems, 2019. [15] J. Devlin., M. Chang., K. Lee., K. Toutanova., “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” 33rd Conference on Neural Information Processing Systems, 2019. [16] K. Simonyan., A. Zisserman., “Very Deep Convolutional Networks for Large-Scale Image Recognition,” 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), June, 2015. [17] A. Krizhevsky., l. Sutskever. and G. E. Hinton., “ImageNet Classification With Deep Convolutional Neural Networks,” Proceedings of the 25th International Conference on Neural Information Processing Systems vol.1, pp.1097-1105, December, 2012. [18] K. He., X. Zhang., S. Ren. and J. Sun., “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.770-778, December, 2016. [19] V. Kougia., J. Pavlopoulos. and I. Androutsopoulos., “A Survey on Biomedical Image Captioning,” Proceedings of the Second Workshop on Shortcomings in Vision and Language, pp.26-30, June, 2019. [20] S. Ioffe. and C. Szegedy., “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” Proceedings of the 32nd International Conference on Machine Learning, vol.37, pp.448-456, July, 2015. [21] J. L. Ba., J. R. Kiros. and G. E. Hinton., “Layer Normalization,” arXiv:1607.06450, 2016. [22] H. d. Vries., F. Strub., J. Mary., H. Larochelle., O. Pietquin. and A. Courville., “Modulating Early Visual Processing by Language,” Advances in Neural Information Processing Systems 30, 2017. [23] I. J. Goodfellow., J. Pouget-Abadie., M. Mirza., B. Xu., D. Warde-Farley., S. Ozair., A. Courville. and Y. Bengio., “Generative Adversarial Networks,” Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol.2, pp.2672–2680, December 2014. [24] D. P. Kingma. and J. Ba., “Adam: A Method for Stochastic Optimization,” the 3rd International Conference for Learning Representations, San Diego, 2015. [25] K. Papineni., S. Roukos., T. Ward. and W. Zhu., “Bleu: A Method for Automatic Evaluation of Machine Translation,” Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp.311-318, July, 2002. [26] C. Lin. “ROUGE: A Package for Automatic Evaluation of Summaries,” Text Summarization Branches Out, pp.74-81, July, 2004. [27] J. Donahue., L. A. Hendricks., M. Rohrbach., S. Venugopalan., S. Guadarrama., K. Saenko. and T. Darrell., “Long-Term Recurrent Convolutional Networks for Visual Recognition and Description,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp.2625-2634, 2015. [28] J. Laaksonen and E. Oja., “Classification with Learning K-Nearest Neighbors,” Proceedings of International Conference on Neural Networks, 1996. [29] B. Jing., P. Xie. and E. Xing., “On the Automatic Generation of Medical Imaging Reports,” Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.2577-2586, July, 2018. [30] C. Y. Li., X. Liang., Z. Hu. and E. P. Xing., “Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation,” Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp.1537-1547, December, 2018. [31] J. Lee., W. Yoon., S. Kim., D. Kim., S. Kim., C. H. So. and J. Kang., “BioBERT: A Pre-Trained Biomedical Language Representation Model for Biomedical Text Mining,” Bioinformatics, vol.36, Issue.4, pp.1234–1240, Febuary, 2020. [32] I. Sutskever., J. Martens. and G. E. Hinton., “ Generating Text With Recurrent Neural Networks,” Proceedings of the 28th international conference on machine learning (ICML-11), pp.1017–1024, 2011. [33] A. Graves., A. Mohamed. and G. E. Hinton., “ Speech Recognition With Deep Recurrent Neural Networks,” In 2013 IEEE international conference on acoustics, speech and signal processing, pp.6645–6649, 2013. [34] D. Bahdanau., K. Cho. and Y. Bengio., “ Neural Machine Translation by Jointly Learning to Align and Translate,” arXiv:1409.0473, 2014. [35] K. Choi., G. Fazekas., M. Sandler. and K. Cho., “Convolutional Recurrent Neural Networks for Music Classification,” 2017 IEEE International Conference on Acoustics Speech and Signal Processing, pp.2392–2396, 2017. [36] F. A. Gers., J. Schmidhuber. and F. Cummins., “Learning to Forget: Continual Prediction with Lstm.” 9th International Conference on Artificial Neural Networks , Institution of Engineering and Technology, vol.5, pp. 850–855, 1999. [37] S. Li., G. Kulkarni., T. L. Berg., A.C. Berg. and Y. Choi., “Composing Simple Image Descriptions Using Web-Scalen-Grams,” Proceedings of the Fifteenth Conference on Computational Natural Language Learning, Association for Computational Linguistics, pp.220–228, 2011. [38] K. Xu., J. Ba., R. Kiros., K. Cho., A. Courville., R. Salakhutdinov., R. Zemel. and Y. Bengio., “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention,” Computer Science, pp.2048-2057, 2015. [39] Q. Wu., C. Shen., L. Liu., A. R. Dick. and A. Hengel., “What Value Does Explicit High Level Concepts Have in Vision to Language Problems,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.203–212, 2016. [40] J. Lu., C. Xiong., D. Parikh. and R. Socher., “Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3242-3250, 2017. [41] N. Li. and Z. Chen., “Image Captioning with Visual-Semantic LSTM,” IJCAI, 2018. [42] F. Liu., X. Ren., Y. Liu., H. Wang. and X. Sun., “SimNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions,” EMNLP, 2018. [43] T. Yao., Y. Pan., Y. Li., T. Mei., “Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.5263-5271, 2017. [44] A. F. Biten., L. Gomez., M. Rusiñol. and D. Karatzas., “Good News, Everyone! Context Driven Entity-Aware Captioning for News Images,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.12458-12467, 2019. [45] J. Pennington., R. Socher. and C. Manning., “Glove: Global Vectors for Word Representation,” Conference on Empirical Methods in Natural Language Processing, pp.1532–1543, 2014. [46] L. Guo., J. Liu., P. Yao., J. Li., H. Lu., “MSCap: Multi-Style Image Captioning with Unpaired Stylized Text,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4199-4208, 2019. [47] C. Szegedy., W. Liu., Y. Jia., P. Sermanet., S. Reed., D. Anguelov., D. Erhan., V. Vanhoucke. and A.Rabinovich., “Going Deeper with Convolutions,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1-9, 2015. [48] S. Wiseman. and A. M. Rush., “Sequence-to-Sequence Learning as Beam-Search Optimization,” arXiv:1606.02960, 2016. [49] J. Chung., C. Gulcehre., K. Cho. and Y. Bengio., “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling,” arXiv:1412.3555, 2014. [50] Github: Book Corpus. https://github.com/soskek/bookcorpus (2020/07/07 visited.) [51] Github: Google BERT. https://github.com/google-research/bert (2020/07/07 visited.) [52] Github: A Survey on Biomedical Image Captioning. https://github.com/nlpaueb/bio_image_caption/tree/master/SiVL19 (2020/07/08 visited.) [53] Github: bert4keras. https://github.com/bojone/bert4keras (2020/07/08 visited.) [54] Large Scale Visual Recognition Challenge (ILSVRC). http://www.image-net.org/challenges/LSVRC/ (2020/07/08 visited.) [55] Microsoft Common Objects in Content. https://cocodataset.org/#detection-leaderboard (2020/07/08 visited.) [56] Keras: Using Pre-Trained Models. https://keras.rstudio.com/articles/applications.html (2020/07/08 visited.) [57] Open-i: Chest X-ray Images From the Indiana University. https://openi.nlm.nih.gov/faq#collection (2020/07/08 visited.) [58] Kaggle: Chest X-Ray (Indiana University). https://www.kaggle.com/raddar/chest-xrays-indiana-university (2020/07/08 visited.) [59] PEIR Digital Library. https://peir.path.uab.edu/library/ (2020/07/08 visited.) [60] Understanding LSTM Networks. https://colah.github.io/posts/2015-08-Understanding-LSTMs/ (2020/09/20 visited.) [61] 基於 Conditional Layer Normalizaiotn 的條件文本生成 https://kexue.fm/archives/7124 (2020/07/08 visited.) |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信