淡江大學覺生紀念圖書館 (TKU Library)
進階搜尋


系統識別號 U0002-3008201802441500
中文論文名稱 應用序列到序列生成模型於雙向文本改寫之研究
英文論文名稱 Using the Sequence to Sequence Generative Model for Bidirectional Text Rewriting
校院名稱 淡江大學
系所名稱(中) 資訊管理學系碩士班
系所名稱(英) Department of Information Management
學年度 106
學期 2
出版年 107
研究生中文姓名 蔣宜衡
研究生英文姓名 Yi-Heng Chiang
學號 604630375
學位類別 碩士
語文別 中文
口試日期 2018-06-02
論文頁數 53頁
口試委員 指導教授-魏世杰
委員-蕭漢威
委員-鄭啟斌
委員-魏世杰
中文關鍵字 自然語言處理  神經機器翻譯  自然語言生成  深度學習  文本改寫 
英文關鍵字 Natural Language Processing  Neural Machine Translation  Natural Language Generation  Deep Learning  Text Rewriting 
學科別分類
中文摘要 語言理解和掌握的能力固然因人而異,但同時也受到歷史變遷的影響。尤其是文言文作為過往的書面語,與一般現代人在日常生活中所使用的白話文存在著明顯的差異,因此現在很多人對於文言文會在理解能力上有所缺乏。
為了彌補文言文與白話文兩種書寫風格間的理解落差,本研究選擇以文言文與白話文的雙向文本改寫為主題,經由自然語言處理(Natural Language Processing)的方式進行語料處理,並且通過深度學習(Deep Learning)架構訓練 Seq2Seq 序列到序列模型,以生成對應書寫風格的語句。另外,本研究也以單語語料訓練文言文及白話文兩套獨立詞向量(Word Vector),來提取各書寫風格下內部詞語間的詞意關聯性。
本研究從文言文與白話文的對應關係著手,通過在兩者相應的平行語料提取彼此之間詞對應(Word Alignment)的關聯性,以此實作雙向神經機器翻譯(Neural Machine Translation)系統。最後,以 BLEU(Bilingual Evaluation Understudy)指標對於系統生成語句做評測。針對測試集的結果顯示,本系統於詞語層級所得到的BLEU得分中,白話文改寫文言文較佳;於字元層級所得到的BLEU得分中,則文言文改寫白話文較佳。而字元層級雙向文本改寫的BLEU得分都明顯勝過詞語層級的表現。
可看出本研究所採用的雙向文本改寫作法,已為導入自然語言技術,應用在理解白話文和文言文的中文書寫風格研究上,提供一個可供探索的方向。
英文摘要 Although the ability to understand and master a language varies from person to person, it is also affected by the evolution of the language itself. In particular, Classical Chinese as a written language of the past has obvious differences from Vernacular Chinese used in modern society. As a consequence, many Chinese today find it hard to understand Classical Chinese texts.
In order to bridge the gap in understanding the two writing styles of Classical Chinese and Vernacular Chinese, this work chooses the bidirectional text rewriting of Classical and Vernacular Chinese as the topic. A parallel corpus is collected and processed by natural language techniques. The corpus is used to train a sequence to sequence model under the deep learning architecture. The model can be used to generate sentences of the desired writing style. In addition, this work also uses two separate monolingual corpora to train two independent sets of word vectors in Classical Chinese and Vernacular Chinese, respectively. It aims to extract the semantic relevance between words in each writing style.
From the parallel corpus, this work tries to find the correspondence relations between Classical Chinese (CC) and Vernacular Chinese (VC). A neural machine translation model is applied to extract the relevant word alignments in the parallel corpus. As result, the BLEU metric is used to evaluate the generated sentences. For the test dataset, it is found that the word-level model can rewrite VC to CC better than CC to VC. In contrast, the character-level model can rewrite CC to VC better than VC to CC. Overall, the character-level model performs better than the word-level model in Chinese text rewriting.
In this work, natural language technologies are applied in rewriting between the two Chinese writing styles of Vernacular Chinese and Classical Chinese. It can be seen that the bidirectional text rewriting method used in this work has provided a promising study direction for understanding related writing styles.
論文目次 目錄
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機 2
1.3 研究目的 3
1.4 論文架構 4
第二章 文獻探討 5
2.1 自然語言處理與生成 5
2.2 傳統機器翻譯 6
2.2.1 統計式機器翻譯 7
2.2.2 規則式機器翻譯 8
2.2.3 其他類型的機器翻譯 9
2.3 神經機器翻譯 9
2.4 詞嵌入 13
第三章 方法介紹 15
3.1 問題定義 15
3.2 系統架構 15
3.3 前處理 16
3.3.1 資料清理 17
3.3.2 繁簡轉換 17
3.3.3 斷詞 18
3.4 詞向量 19
3.4.1 單熱向量 20
3.4.2 Word2Vec 21
3.5 LSTM 23
3.6 Seq2Seq 23
3.7 BLEU 25
第四章 實驗設計與結果 28
4.1 實驗環境 28
4.1.1 使用套件 28
4.1.2 語料來源 30
4.1.3 文本改寫語料 32
4.1.4 詞向量語料 32
4.2 實驗設計 33
4.2.1 文本改寫實驗 33
4.2.2 詞向量實驗 34
4.3 實驗結果 35
4.3.1 文本改寫結果 36
4.3.2 詞向量結果 40
4.4 文本改寫結果討論 40
第五章 結語與未來發展 44
5.1 結論 44
5.2 研究限制 45
5.3 未來展望 46
參考文獻 47

表目錄
表1 訓練語料例句呈現表 19
表2 詞語層級訓練時 Seq2Seq 兩端序列內容例句呈現表 24
表3 執行環境列表 28
表4 套件與實作功能對應表 28
表5 語料列表 30
表6 詞語層級雙向神經機器翻譯訓練集 BLEU 評測得分表 37
表7 詞語層級雙向神經機器翻譯測試集 BLEU 評測得分表 38
表8 字元層級雙向神經機器翻譯訓練集 BLEU 評測得分表 38
表9 字元層級雙向神經機器翻譯測試集 BLEU 評測得分表 39
表10 測試集預例句呈現表 39
表11 訓練集預測例句呈現表 39
表12 文言詞向量 相似詞排序呈現表 40
表13 白話文詞向量 相似詞排序呈現表 40
表14 詞向量 詞語配對餘弦相似度呈現表 41
表15 詞向量 類推詞語關係呈現表 41

圖目錄
圖1 系統架構圖 15
圖2 單熱向量示意圖 21
圖3 Word2Vec 的兩種建立方式 CBOW 及 Skip-grams 說明圖 22
圖4 詞語 層級訓練時 Seq2Seq 兩端序列內容例句呈現圖 24
圖5 Seq2Seq 文言到白話神經機器翻譯訓練過程圖 37
圖6 Seq2Seq 白話文到言神經機器翻譯訓練過程圖 37
參考文獻 [1] Artetxe, M., Labaka, G., Agirre, E., Cho, K. (2018) Unsupervised Neural Machine Translation. arXiv preprint arXiv:1710.11041.
[2] Badalamenti, A.F. (1991) Language and the Intuition of Meaning. Systems Research 8(4), pp. 43-66
[3] Bahdanau, D., Cho, K., Bengio, Y. (2015) Neural Machine Translation by Jointly Learning to Align and Translate. In: ICLR 2015.
[4] Banchs, R.E., D'Haro, L.F., Li, H. (2015) Adequacy–Fluency Metrics: Evaluating MT in the Continuous Space Model Framework. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 23(3), pp. 472–482
[5] Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C. (2003) A Neural Probabilistic Language Model. Journal of Machine. Learning Research 3:1137–1155.
[6] Bikel, D., & Zitouni, I. (2012). Multilingual Natural Language Processing Applications: From Theory to Practice. Indianapolis, IN: IBM Press.
[7] Bordes, A., Glorot, X., Weston, J., Bengio, Y. (2012) Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing. In Proceedings of AISTATS, pp. 127–135.
[8] Britz, D., Goldie, A., Luong, M.-T., Le, Q.V. (2017) Massive Exploration of Neural Machine Translation Architectures. CoRR abs/1703.03906.
[9] Brown, P.F., Cocke, J., Pietra, S.A.D., Pietra, C.J.D., Jelinek, F.,Lafferty, J.D., Mercer, R.L., Roossin, P.S. (1990) A Statistical Approach To Machine Translation. Computational linguistics 16 (2), 79-85.
[10]Brown, P.E., Pietra, S.A.D., Pietra, V.J.D., Mercer, R.L. (1993) The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19(2):263–311.
[11]Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y. (2014) Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In: Proceedings of EMNLP 2014.
[12]Chollet, F. (2017, September 29). A ten-minute introduction to sequence-to-sequence learning in Keras [Web blog content]. Retrieved from https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html
[13]Chung, J., Cho, K., Bengio, Y. (2016) A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation. CoRR abs/1603.06147
[14]Conneau, A., Schwenk, H., Barrault, L., Lecun, Y. (2017) Very Deep Convolutional Networks for Text Classification. arXiv preprint arXiv:1606.01781
[15]Devitt, M. (2006) Intuitions In Linguistics. The British Journal for the Philosophy of Science. 57(3), pp. 481–513
[16]Doddington, G. (2002) Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics. In Second International Conference on Human Language Technology Research, San Diego, CA. pp. 138–145.
[17]Fitzgerald, G.(2010) Linguistic Intuitions. The British Journal for the Philosophy of Science. 61(1), pp. 123–160
[18]Harris, Z.S. (1954) Distributional Structure. Word, 10(2-3), 146-162.
[19]Hinton, G.E. (1986) Learning distributed representations of concepts. In Proceedings of the Eighth Annual Conference of the Cognitive Science Society. Hillsdale, NJ: Erlbaum.
[20]Hinton, G.E., Osindero, S., Teh, Y.-W. (2006) A fast learning algorithm for deep belief nets. Neural Computation, Volume 18, 2006, 2283-2292.
[21]Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R. (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.
[22]Hochreiter, S., Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computation, 9(8), pp.1735–1780. oi:10.1162/neco.
[23]Huang, P.-S., Wang, C., Huang, S., Zhou, D., Deng, L. (2018) Towards Neural Phrase-Based Machine Translation.arXiv preprint arXiv:1706.05565
[24]Jean, S., Cho, K., Memisevic, R., & Bengio, Y. (2015) On Using Very Large Target Vocabulary for Neural Machine Translation. In ACL-IJCNLP 2015
[25]Kaiser, L., Gomez, A.N., Chollet, F. (2017) Depthwise Separable Convolutions for Neural Machine Translation. arXiv preprint arXiv:1706.03059
[26]Kaiser, L., Gomez, A.N., Shazeer, N., Vaswani, A., Parmar, N., Jones, L., Uszkoreit, J. (2017) One Model To Learn Them All. arXiv preprint arXiv:1706.05137
[27]Kim, Y., Jernite, Y., Sontag, D., Rush, A.M. (2015) Character-Aware Neural Language Models. arXiv preprint arXiv preprint arXiv:1508.06615
[28]Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., ...Herbst, E. (2007) Moses: Open Source Toolkit for Statistical Machine Translation. Annual Meeting of the Association for Computational Linguistics (ACL), demonstration session, Prague, Czech Republic, June 2007.
[29]Koehn, P., Och, F.J., Marcu, D. (2003) Statistical Phrase-Based Translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1. Stroudsburg, PA, USA: Association for Computational Linguistics. 2003. p. 48-54. (NAACL '03).
[30]Koehn, P. (2004) Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models. Conference of the Association for Machine Translation in the Americas, 115-124.
[31]Krizhevsky, A., Sutskever, I., Hinton, G.E. (2012) ImageNet Classification with Deep Convolutional Neural Networks. Neural Information Processing Systems 25.
[32]Kumar, A., Irsoy, O., Ondruska, P., Iyyer, M., Bradbury, J., Gulrajani, I., ...Socher, R. (2016) Ask Me Anything: Dynamic Memory Networks for Natural Language Processing. arXiv preprint arXiv:1506.07285.
[33]Lample, G., Denoyer, L., Ranzato, M. (2017) Unsupervised Machine Translation Using Monolingual Corpora Only. arXiv preprint arXiv:1711.00043
[34]Lee, J., Cho, K., Hofmann, T. (2017) Fully Character-Level Neural Machine Translation without Explicit Segmentation. arXiv preprint arXiv:1610.03017.
[35]Le, Q.V., Mikolov, T. (2014) Distributed Representations of Sentences and Documents. arXiv preprint arXiv:1405.4053.
[36]Luong, M.-T., Pham, H., Manning, C.D. (2015) Effective Approaches to Attention-Based Neural Machine Translation. In Proceedings of the 2015 Conference on EMNLP.
[37]Luong, M.-T., Sutskever, I., Le, Q.V., Vinyals, O., Zaremba, W. (2014) Addressing the Rare Word Problem in Neural Machine Translation. arXiv preprint arXiv:1410.8206
[38]Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013) Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781.
[39]Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J. (2013) Distributed Representations of Words and Phrases and their Compositionality. arXiv preprint arXiv:1310.4546
[40]Nallapati, R., Zhou, B., Santos, C.N.d., Gulcehre, C., Xiang, B. (2016) Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond. arXiv preprint arXiv:1602.06023
[41]Neubig, G. (2017) Neural Machine Translation and Sequence-to-sequence Models: A Tutorial. arXiv preprint arXiv:1703.01619
[42]Och, F.J., Ney, H. (2002) Discriminative Training and Maximum Entropy Models for Statistical Machine Translation. 40th Annual Meeting of the Association for Computational Linguistics, 295-302.
[43]Och, F.J. (2003) Minimum Error Rate Training in Statistical Machine Translation. In Proceedings of the 41st Annual Meeting of Association for Computational Linguistics, pages 160-167, Sapporo, Japan, July.
[44]Och, F.J., Ney, H. (2003). A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1), 19-51.
[45]Och, F.J., Ney, H. (2004) The Alignment Template Approach to Statistical Machine Translation. Computational Linguistics, 30(4):417-449.
[46]Papineni, K., Roukos, S., Ward, T., Zhu, W.-J. (2002) BLEU: a Method for Automatic Evaluation of Machine Translation. In ACL 2002, pp. 311–318.
[47]Perera, R., Nand, P. (2017) Recent Advances in Natural Language Generation: A Survey and Classification of the Empirical Literature. Computing and Informatics, 36(1), 1–32. Retrieved from http://www.cai.sk/ojs/index.php/cai/article/viewArticle/2017_1_1
[48]Peter, P.F., Cocke, J., Della Pietra, S., Pietra, D., Della Pietra, V., Jelinek, F., Lafferty, J., Mercer, R.L., Roossin, P.S. (1990) A Statistical Approach To Machine Translation. Computational Linguistics 16(2):79-85
[49]Reynolds, A.C. (1954): The conference on mechanical translation held at M.I.T., June 17-20, 1952. Mechanical Translation 1 (3), 47-55.
[50]Rumelhart, D.E., Hinton, G.E., Williams, R.J. (1986) Learning representations by back-propagation errors. Nature, 323(6088), 533-536.
[51]Rong, X. (2016) word2vec Parameter Learning Explained. arXiv preprint arXiv:1411.2738
[52]Saha, D., Bandyopadhyay, S. (2005) A Semantics-based English-Bengali EBMT System for translating News Headlines. In: Proceedings of the 10th Machine Translation Summit (Phuket, 12-16 December 2005), 125-133.
[53]Sennrich, R., Haddow, B., Birch, A. (2016) Neural Machine Translation of Rare Words with Subword Units. arXiv preprint arXiv:1508.07909
[54]Shannon, C.E. (1948) A Mathematical Theory of Communication. The Bell System Technical Journal, 27.
[55]Strassel, S., Przybocki, M., Peterson, K., Song, Z., Maeda, K. (2008) Linguistic Resources and Evaluation Techniques for Evaluation of Cross-Document Automatic Content Extraction. Proceedings of the 6th International Language Resources and Evaluation Conference (LREC-08), Marrakech, Morocco.
[56]Sutskever, I., Vinyals, O., Le, Q.V. (2014) Sequence to Sequence Learning with Neural Networks. arXiv preprint arXiv:1409.3215.
[57]Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R. (2014) Dropout: A Simple Way to Prevent Neural Networks from Overfitting. The Journal of Machine Learning Research, 15(1), 1929–1958.
[58]Taskar, B., Lacoste-Julien, S., Klein, D. (2005) A Discriminative Matching Approach to Word Alignment. In Proceedings of HLT/EMNLP 2005, 73-80, Vancouver, British Columbia, Canada, October.
[59]Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I. (2017) Attention Is All You Need. arXiv preprint arXiv:1706.03762v4
[60]Vinyals, O., Le, Q.V. (2015) A Neural Conversational Model. arXiv preprint arXiv:1506.05869
[61]Wu, Y., Schusterm, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., ...Dean, J. (2016) Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv preprint arXiv:1609.08144
[62]Yann, L., Bengio, Y., Hinton, G. (2015) Deep learning. Nature 521.7553 (2015): 436–444.
[63]Zaccone, G. (2016). Getting Started with TensorFlow. Birmingham, England: Packt Publishing Ltd.
[64]鄭捷(2017)。NLP漢語自然語言處理原理與實踐。北京市:電子工業出版社
論文使用權限
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2023-08-30公開。
  • 同意授權瀏覽/列印電子全文服務,於2023-08-30起公開。


  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2486 或 來信