§ 瀏覽學位論文書目資料
  
系統識別號 U0002-3107202515355400
DOI 10.6846/TKU_Electronic Theses & Dissertations Service202500360
論文名稱(中文) 基於領域特化的長文本深度學習模型於法律判決預測的研究—以歐洲人權法院為例
論文名稱(英文) Research on Legal Judgment Prediction Using a Domain-Specific Long-Text Deep Learning Model: A Case Study of the European Court of Human Rights
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系碩士班
系所名稱(英文) Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 113
學期 2
出版年 114
研究生(中文) 劉兆崴
研究生(英文) Chao-Wei Liu
學號 612410653
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2025-07-09
論文頁數 85頁
口試委員 指導教授 - 吳孟倫(mlwutp@gmail.com)
口試委員 - 林其誼( chiyilin@mail.tku.edu.tw)
口試委員 - 高昶易(edenkao@scu.edu.tw)
關鍵字(中) BERT
Transformer
長文本處理
Legal-BERT
法律判決預測
歐洲人權法院
Bi-GRU
注意力機制
滑動視窗
關鍵字(英) BERT
Transformer
Long Text Processing
Legal-BERT
Legal Judgment Prediction
European Court of Human Rights
Bi-GRU
Attention
Sliding Window
第三語言關鍵字
學科別分類
中文摘要
隨着歐洲人權法院(ECHR)案件與判決持續增加,人工審閱效率已遭遇瓶頸。本論文提出一套專為法律長文本設計的深度學習模型,僅處理英文判決書中的「事實」段落。透過結合 Legal-BERT、100 詞重疊的滑動視窗、Bi-GRU 層,以及注意力式聚合模組,本方法能將判決切分為重疊片段,先以 BERT 做編碼,再由 Bi-GRU 串接上下文和注意力機制重新對齊跨段語意,從而克服 BERT 內建的 512 詞長度上限。在公開 ECHR 資料集上,領域專屬預訓練將 Macro-F1 從 82.7 % 提升至 84.6 %,Precision 達 89.4 %,整體表現明顯優於 Base-BERT 與HIER-BERT 等基準模型。重疊視窗實驗亦證實 100 詞設定可同時最佳化 Precision、Recall 與 Macro-F1。消融實驗亦證明了 Bi-GRU 和Attention 層的聚合能力,能有效整合 BERT 編碼層的輸出。可解釋性分析顯示,本模型架構特別聚焦「偏見」等關鍵詞,與法官判斷邏輯高度契合,在提升效能的同時保持透明度。綜合而言,本方法於不增加過多運算成本下,提供一套兼具精準、召回平衡且可解釋的長文本法律判決預測解決方案,對智慧司法輔助具備實用價值。
英文摘要
As the number of European Court of Human Rights (ECHR) cases and judgments continues to grow, manual review efficiency has reached a bottleneck. This paper presents a deep learning model specifically designed for legal long texts, processing only the "Facts" sections of English judgments. By combining Legal-BERT, a sliding window with 100-word overlap, a Bi-GRU layer, and an attention-based aggregation module, this method segments the judgment into overlapping fragments, encodes them with BERT, and then re-aligns the cross-segment semantics through Bi-GRU contextualization and attention mechanisms, overcoming the inherent 512-word limitation of BERT. On the public ECHR dataset, domain-specific pre-training improves Macro-F1 from 82.7% to 84.6%, with precision reaching 89.4%, significantly outperforming baseline models like Base-BERT and HIER-BERT. Overlapping window experiments also confirm that the 100-word setting optimizes precision, recall, and Macro-F1 simultaneously. Ablation studies further validate that the Bi-GRU and attention layers’ aggregation capabilities effectively integrate BERT's encoded outputs. Explanation analysis reveals that the model particularly focuses on key terms like "bias," aligning closely with judicial reasoning, maintaining both performance and transparency. Overall, this method provides a precise, recall balanced, and interpretable solution for long-text legal judgment prediction, offering practical value for intelligent judicial assistance with minimal computational overhead.
第三語言摘要
論文目次
目錄
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機 3
1.3 研究目的 4
1.4 研究問題 5
1.5 研究架構 5
第二章 文獻探討 7
2.1 基於傳統機器學習和特徵工程的法律判決預測 7
2.1.1 方法概述 7
2.1.2 重要研究成果與比較 8
2.2 基於深度學習的法律判決預測 10
2.2.1 方法概述 10
2.2.2 重要研究成果與比較 11
2.3 基於 Transformer 的法律判決預測 13
2.3.1 方法概述 13
2.3.2 重要研究成果與比較 14
2.3.3 領域化與模型改良 16
2.4 小結 17
第三章 研究方法 19
3.1 系統架構 19
3.2 文字前處理 20
3.3 BERT 編碼層 23
3.3.1 Base-BERT 與 Legal‑BERT 25
3.3.2 [CLS] Token 向量 26
3.4 編碼聚合層 28
3.4.1 BiGRU 雙向特徵擷取 29
3.4.2 Attention 權重加成 29
3.5 分類層 31
3.6 小結 32
第四章 資料集與實驗 34
4.1 資料集 34
4.1.1 歐洲人權法院(ECHR)法條介紹 34
4.1.2 歐洲人權法院(ECHR)英文判決書示例 36
4.1.3 ECHR 資料集 40
4.1.4 資料集案件標籤分佈與條款統計 42
4.2 實驗設定與評估 45
4.2.1 實驗設備與環境 45
4.2.2 實驗目標與流程 45
4.2.3 實驗超參數設定 46
4.2.4 實驗評估指標 47
4.3 實驗 49
4.3.1 實驗一:Base-BERT 與 Legal-BERT 編碼層之比較 49
4.3.2 實驗二:與文獻 [19] 之基準模型結果比較 50
4.3.3 實驗三:深度架構模組之消融分析 51
4.3.4 實驗四:Sliding Window 重疊效果分析 52
4.3.5 實驗五:注意力機制與可解釋性 53
4.4 小結 56
第五章 結論 58
參考文獻 60
附錄一 CASE OF GATT v. MALTA 判決全文 64
附錄二 案件編號「001-100989」資料集原始 JSON 檔 80

圖目錄
圖 2-1 原始 SVM 模型示意圖 [8] 9
圖 2-2 用於預測 Article 6 案件之機器學習模型架構圖 [12] 10
圖 2-3 FastText 模型架構圖,用於司法判決中刑期預測任務 [17] 12
圖 2-4 Transformer 模型示意圖 14
圖 2-5 模型架構圖 [20] 15
圖 3-1 整體系統架構示意圖 20
圖 3-2 前處理流程與 Sliding Window 切分示意圖 21
圖 3-3 BERT 的架構示意圖 [2],展示預訓練與自注意力機制 23
圖 3-4 本論文架構:前處理 + 編碼層。子序列經 Tokenizer 後輸入
BERT,輸出對應的隱藏表示 24
圖 3-5 聚合模組示意圖:段落向量經 BiGRU 與 Attention 聚合為文件
表示 28
圖 3-6 BiGRU 結構:正向與反向 GRU 之輸出於每步驟拼接 30
圖 3-7 注意力機制示意:計算段落權重 𝛼𝑘 並加權求和 31
圖 4-1 ECHR 英文判決書結構示意圖 39
圖 4-2 注意力標註 1(較多版) 54
圖 4-3 注意力標註 2(較多版) 54
圖 4-4 注意力標註(較少版) 55

表目錄
表 2-1 Legal-BERT 預訓練的語料統計,包括總大小(來源 [3]) 17
表 3-1 Legal‑BERT 二次預訓練語料統計 [3]。 26
表 4-1 ECHR 核心權利與主要議定書條文 [25] 35
表 4-2 ECHR 資料集統計概覽 [19] 40
表 4-3 各條款違反次數在訓練集與測試集中的分佈(粗體為高頻條款) 43
表 4-4 每案違反條款數量分佈(測試集與訓練集) 44
表 4-5 實驗超參數設定 47
表 4-6 使用不同 BERT 編碼層之分類效能比較 49
表 4-7 本論文模型與文獻 [19] 各基準模型比較 50
表 4-8 不同模型變體之分類表現(使用 Legal-BERT) 51
表 4-9 不同 Overlap 設定下之分類效能比較 53
參考文獻
[1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems (NeurIPS), (Long Beach, CA), pp. 5998-6008, Curran Asso-ciates, Inc., 2017.
[2] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), (Minneapolis, Minnesota), pp. 4171-4186, Association for Computational Linguistics, 2019.
[3] I. Chalkidis, I. Androutsopoulos, and N. Aletras, “LEGAL-BERT: The muppets straight out of law school,” in Findings of the Association for Computational Linguistics: EMNLP 2020, (Online), pp. 2898-2904, Association for Computational Linguistics, 2020.
[4] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273-297, 1995.
[5] L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.
[6] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “RoBERTa: A robustly optimized bert pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
[7] M. Zaheer, G. Guruganesh, A. Dubey, J. Ainslie, C. Alberti, S. Ontanon, P. Pham, A. Ravula, Q. Wang, L. Yang, and A. Ahmed, “Big Bird: transformers for longer sequences,” in Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, (Red Hook, NY, USA), Curran Associates Inc., 2020.
[8] H. P. Sultana, N. Shrivastava, D. D. Dominic, N. Nalini, and J. M. Balajee, “Comparison of machine learning algorithms to build optimized network intrusion detection system,” Journal of Computational and Theoretical Nanoscience, vol. 16, pp. 2541-2549, May 2019.
[9] N. Aletras, D. Tsarapatsanis, D. Preoţiuc-Pietro, and V. Lampos, “Predicting judicial decisions of the european court of human rights: A natural language processing perspective,” PeerJ computer science, vol. 2, p. e93, 2016.
[10] M. Medvedeva, M. Vols, and M. Wieling, “Judicial decisions of the european court of human rights: looking into the crystall ball,” in Proceedings of the Conference on Empirical Legal Studies in Europe 2018, p. 24, 2018.
[11] Z. Liu and H. Chen, “A predictive performance comparison of machine learning models for judicial cases,” in 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1-6, 2017.
[12] S. Iniyan, N. Raghuvanshi, and P. Maurya, “Court judgment prediction for article6 using machine learning,” in 2024 Ninth International Conference on Science Technology Engineering and Mathematics (ICONSTEM), pp. 1-6, 2024.
[13] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” Proceedings of Workshop at ICLR, vol. 2013, pp. 1-2, January 2013.
[14] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information,” Transactions of the association for computational linguistics, vol. 5, pp. 135-146, 2017.
[15] A. Kaur and B. Bozic, “Convolutional neural network-based automatic prediction of judgments of the european court of human rights.,” in AICS, pp. 458-469, 2019.
[16] S. Ahmad, M. Asghar, F. M. Alotaibi, and Y. D. Al-Otaibi, “A hybrid CNN + BIL-STM deep learning-based DSS for efficient prediction of judicial case decisions,” Expert Systems with Applications, vol. 209, July 2022.
[17] B. Chen, Y. Li, S. Zhang, H. Lian, and T. He, “A deep learning method for judicial decision support,” in 2019 IEEE 19th International Conference on Software Quality, Reliability and Security Companion (QRS-C), pp. 145-149, 2019.
[18] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and Q. Le, “XLNet: Generalized autoregressive pretraining for language understanding,” in Proceedings of the 33rd International Conference on Neural Information Processing Systems, (Red Hook, NY, USA), Curran Associates Inc., 2019.
[19] I. Chalkidis, I. Androutsopoulos, and N. Aletras, “Neural legal judgment prediction in english,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, (Florence, Italy), pp. 4317-4323, Association for Computational Linguistics, July 2019.
[20] A. George, J. Alphonsa, F. Ashik, P. Priyanka, P. Basa, and S. Parida, “Enhancing legal decision making: WRIT case outcome prediction with LegalBERT embeddings and AdaBoost classifier,” in 2024 IEEE International Conference on Contemporary Computing and Communications (InC4), pp. 1-6, 2024.
[21] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” J. Mach. Learn. Res., vol. 21, January 2020.
[22] T. Ghosh and S. Kumar, “Evaluating transformer models for legal judgement prediction: A comparative study,” in 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), pp. 1-4, 2024.
[23] K. Cho, B. van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder–decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), (Doha, Qatar), pp. 1724-1734, Association for Computational Linguistics, October 2014.
[24] M. Medvedeva and P. Mcbride, “Legal judgment prediction: If you are going to do it, do it right,” in Proceedings of the Natural Legal Language Processing Workshop 2023, (Singapore), pp. 73-84, Association for Computational Linguistics, December 2023.
[25] Council of Europe, European Convention on Human Rights. Strasbourg, France: European Court of Human Rights, 2021. Available: www.echr.coe.int.
[26] W. Wiegreffe and Y. Pinter, “Attention is not explanation,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), (Hong Kong, China), pp. 11-20, Association for Computational Linguistics, November 2019.
論文全文使用權限
國家圖書館
同意無償授權國家圖書館,書目與全文電子檔於繳交授權書後, 於網際網路立即公開
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
不同意授權予資料庫廠商
校外書目立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信