淡江大學覺生紀念圖書館 (TKU Library)
進階搜尋


系統識別號 U0002-2508202015153300
中文論文名稱 結合句子為基礎的中文新聞分析與技術指標之股票趨勢預測技術
英文論文名稱 Stock Trend Prediction Techniques by Combining Sentence-based Chinese News Analysis and Technical Indicators
校院名稱 淡江大學
系所名稱(中) 資訊工程學系碩士班
系所名稱(英) Department of Computer Science and Information Engineering
學年度 108
學期 2
出版年 109
研究生中文姓名 陳柏曄
研究生英文姓名 Po-Yeh Chen
學號 607410122
學位類別 碩士
語文別 中文
第二語文別 英文
口試日期 2020-07-16
論文頁數 58頁
口試委員 指導教授-鄭建富
共同指導教授-陳俊豪
委員-吳牧恩
委員-陳朝鈞
中文關鍵字 中文新聞探勘  技術分析  股票趨勢預測  情緒分析  支援向量機 
英文關鍵字 Chinese news data mining  technique analysis  stock trend prediction  sentiment analysis  support vector mechanism 
學科別分類 學科別應用科學資訊工程
中文摘要 長久以來股票趨勢預測都是個熱門的議題,也吸引了大量的學者跟專業人士熱絡的討論與研究。現有文獻的研究顯示,使用對財經新聞之情緒分析,以及技術性指標所預測股票的準確度,會比只使用情緒分析或技術性指標來的更高。雖然已有許多研究被提出,透過整篇新聞的關鍵字建立的模型進行股票趨勢預測,但是在中文新聞裡的效果並不如預期。故本論文嘗試使用關鍵句子的方式來進行新聞文件分析建立模型,並且搭配技術性指標預測股票的趨勢。所提的方法首先從新聞中產生所有的句子。接著,使用TextRank與word2vec,可計算出每一句子的分數,進而產生Top-k關鍵句子。關鍵句子進一步透過財經情緒字典產生每一句子的情緒分數。具情緒分數的關鍵句子與技術指標形成分類屬性,而股價之開收盤價則用於決定此筆訓練之趨勢。根據句子情緒分數,我們進一步將資料分成正負面新聞資料集,並用來建立正負面新聞趨勢預測模型。最後,在預測階段則可透過這兩個預測模型的結果與布林通道所發出的訊號,進行股票趨勢預測並進行買賣。實驗部分使用了一家台灣股票公司近五年的相關新聞以及股票資訊,透過不同的參數進行實驗評估。結果顯示所提的方法是有效的。
英文摘要 The stock trend prediction is always a hot topic, and it attracts a lot of scholars and professionals to discuss and research. From the existing literature, the accuracy of stock trend prediction with both using sentiment analysis of financial news and technical analysis is better than that use only one of them. Although there are many approaches have been proposed to build prediction model based on the key words extracted for news, the literature still shows that the accuracy of the model could be improved when using Chinese news. To handle the problem, this thesis proposes a sentence-based stock trend prediction model along with technical indicators for stock trend prediction. The proposed approach first divides news into sentence. Then, using the TextRank and word2vec, the top-k key sentences are generated. The sentiment scores of those key sentences are calculated through the financial lexicon. The key sentences with sentiment scores and technical indicators are used as classification attributes, and the open and close stock prices are used to determine the stock trend. Based on the sentiment scores of sentences, we divide the dataset into positive and negative datasets, and use them to construct the positive and negative stock trend prediction models. At last, through the two prediction models and the signal indicated by the Bollinger Bands, the stock trend will be determined and trading signals will also be generated. In experiments, the news and stock prices of five years of a company were conducted to show the effectiveness of the proposed approach via different parameter settings.
論文目次 目錄
第一章 簡介 1
1.1 動機 1
1.2 貢獻 2
1.3 讀者指南 3
第二章 背景知識與文獻回顧 4
2.1 關鍵字為基礎的文件分析技術 5
2.2句子為基礎的文件分析技術 6
2.3 技術性指標 8
2.3.1 KD指標 8
2.3.2 RSI指標 9
2.3.3 布林通道 10
2.4 中文文本預處理 10
2.4.1 文本分割 11
2.4.2 TextRank 11
2.4.3 Word 2vec 12
2.5 分類器 13
2.5.1 支持向量機 13
2.5.2 決策樹 13
2.5.3 單純貝式分類器 13
第三章 句子為基礎的股票趨勢預測 15
3.1 方法流程圖 15
3.2 資料前處理 15
3.2.1 關鍵句子提取 16
3.2.2 情緒分析 17
3.3 建立分類模型 18
3.3.1 訓練階段 19
3.3.2 測試階段 20
3.4 虛擬碼 21
3.5 流程範例 25
第四章 實驗數據 31
4.1 實驗資料與環境設置 31
4.2 不同訓練區間對演算法的影響 34
4.3 不同參數設定對演算法的影響 36
4.4 與現有方法的比較 40
第五章 結論與未來工作 42
參考文獻 43
附錄 英文論文 48

圖目錄
圖1句子為基礎的股票趨勢預測之流程圖 15
圖2 資料前處理之流程圖 16
圖3 關鍵句子之流程圖 17
圖4 情緒分析的流程圖 18
圖5 建立模型之流程圖 19
圖6 新聞情緒之分類判斷 19
圖7 訓練正面模型之流程圖和負面模型之流程圖 20
圖8 測試階段之流程圖 21
圖9 使用TextRank找出關鍵字 26
圖10 把文章藉由特定標點符號進行斷句 26
圖11 利用word2vec兩個字詞的相似度 27
圖12 計算出每個句子重要分數 27
圖13 選取關鍵句子keysentence 27
圖14 情緒詞典,包含正面詞典以及負面詞典 28
圖15 KSWi對詞典Dicto進行相似度比較得出WOSi 28
圖16 根據句子裡的正負面詞數來做總和為句子傾向分數SOS 29
圖17 實際資料所產生的屬性輸入進兩個模組裡 30
圖18 正面模組、負面模組以及布林通道進行投票 30
圖19 可成(2474)之開盤價折線圖 32
圖20 可成(2474)之收盤價折線圖 32

表目錄
表1 投資人類別交易比重統計表 2
表2 相關技術之文獻目錄(以年份排序) 4
表3 關鍵字為基礎的文件分析技術之文獻的方法技術統整 6
表4 以句子為基礎的文件分析技術之文獻的方法技術統整 8
表5 關鍵句子提取之虛擬碼 22
表6 情緒分析之虛擬碼 23
表7 資料前處理之虛擬碼 24
表8 模型建置之虛擬碼 24
表9 預測趨勢之虛擬碼 25
表10 不同區間的訓練資料以及測試資料筆數 34
表11 不同訓練區間對演算法之實驗結果 35
表12 不同的關鍵字所分類的正負面新聞 36
表13 不同關鍵字對演算法的實驗結果 37
表14 不同關鍵句子對演算法的實驗結果 38
表15 不同預測時間點對演算法的實驗結果 39
表16 不同標準差的倍數對演算法的實驗結果 40
表17 PM+NM+BBAND和GASTP的實驗結果 41


參考文獻 [1] B. Ge, C. He, C. Zhang and Y. Hu, “Classification Algorithm of Chinese Sentiment Orientation Based on Dictionary and LSTM,” ICBDR 2018: Proceedings of the 2nd International Conference on Big Data Research, pp. 119-126, 2018
[2] C. Song, X. Wang, P. Cheng ,J. Wang and L. Li, “SACPC: A framework based on probabilistic linguistic terms for short text sentiment analysis,” Knowledge-Based Systems, 2019
[3] F. Chen and Y. Huang, “Knowledge-enhanced neural networks for sentiment analysis of Chinese reviews,” Neurocomputing, Vol. 368, pp 51-58, 2019
[4] G. Chen, L. He and K. Papangelis, “Sentimental Analysis of Chinese New Social Media for stock market information,” PRAI '19: Proceedings of the 2019 the International Conference on Pattern Recognition and Artificial Intelligence, pp. 1-6, 2019
[5] G. Wu, T. Hou and J. Lin, “Can economic news predict Taiwan stock market returns? ,” Asia Pacific Management Review, Vol. 24, pp. 54-59, 2019
[6] G. Xu, Y. Meng, X. Qiu, Z. Yu and X. Wu, “Sentiment Analysis of Comment Texts Based on BiLSTM,” IEEE Access, vol 7, pp 51522-51532, 2019
[7] G. Xu, Z. Yu, H. Yao, F. Li, Y. Meng and X. Wu, “Chinese Text Sentiment Analysis Based on Extended Sentiment Dictionary,” IEEE Access, Vol. 7, pp. 43749-43762, 2019
[8] H. Yuan, Y. Wang, X. Feng and S. Sun, “Sentiment Analysis Based on Weighted Word2vec and Att-LSTM,” CSAI '18: Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence, pp. 420-424, 2018
[9] J. Wu, K. Lu, S. Su and S. Wang, “Chinese Micro-Blog Sentiment Analysis Based on Multiple Sentiment Dictionaries and Semantic Rule Sets,” IEEE Access, Vol. 7, pp. 183924-183939, 2019
[10] J. Yang, R. Yang, H. Lu, C. Wang and J. Xie, “Multi-Entity Aspect-Based Sentiment Analysis with Context, Entity, Aspect Memory and Dependency Information,” ACM Transactions on Asian and Low-Resource Language Information Processing, Vol 18, No. 4, 2019
[11] J. Zhou, Y. Lu, H. Dai, H. Wang and H. Xiao, “Sentiment Analysis of Chinese Microblog Based on Stacked Bidirectional LSTM,” IEEE Access, Vol. 7, pp 38856-38866, 2019
[12] K. Sun, Y. Li, D. Deng and Y. Li, “Multi-Channel CNN Based Inner-Attention for Compound Sentence Relation Classification,” IEEE Access, Vol. 7, pp. 141801-141809, 2019
[13] P. Wang, Y. Luo, Z. Chen, L. He and Z. Zhang, “Orientation Analysis for Chinese News Based on Word Embedding and Syntax Rules,” IEEE Access, Vol. 7, pp. 159888-15898, 2019
[14] Q. Pi, Y. B. Shao, H. Long and C. Yang, “Chinese Sentence Decomposition Based on Hierarchical Word Order,” Procedia Computer Science, Vol 166, pp. 469-474, 2020
[15] S. Huang and W. Cheng, “Discovering Chinese sentence patterns for feature-based opinion summarization,” Electronic Commerce Research and Applications, Vol. 14, pp. 582-591, 2015
[16] S. Jia, Shijia E, Maozhen Li and Yang Xiang, “Chinese Open Relation Extraction and Knowledge Base Establishment,” ACM Transactions on Asian and Low-Resource Language Information Processing, No. 15, 2018
[17] S. Liu, K. Chen and B. Chen, “Enhanced Language Modeling with Proximity and Sentence Relatedness Information for Extractive Broadcast News Summarization,” ACM Transactions on Asian and Low-Resource Language Information Processing, No. 46, 2020
[18] S. Wang, J. Zhang and C. Zong, “Empirical Exploring Word-Character Relationship for Chinese Sentence Representation”, ACM Transactions on Asian and Low-Resource Language Information Processing, Vol. 17, No. 3, 2019
[19] T. Chen, R. Xu, Y. He and X. Wang, “Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN,” Expert Systems with Applications, Vol. 72, pp. 221-231, 2017
[20] W. Che, Y. Zhao, H. Guo, Z. Su and T. Liu, “Sentence compression for aspect-based sentiment analysis,” IEEE/ACM Transactions on Audio, Speech and Language Processing, 2015
[21] W. Sun, Y. Chen, X. Wan and M. Liu, “Parsing Chinese Sentences with Grammatical Relations,” Computational Linguistics, Vol. 45, No. 1, 2019
[22] W. Wang, D. Huang and J. Cao, “Chinese Syntax Parsing Based on Sliding Match of Semantic String,” ACM Transactions on Asian and Low-Resource Language Information Processing, No. 7, 2019
[23] X. Li, P. Wu and W. Wang, “Incorporating stock prices and news sentiments for stock market prediction: A case of Hong Kong,” Information Processing & Management, 2020
[24] Y. Miao, Y. Ji and E. Peng, “Application of CNN-BiGRU Model in Chinese Short Text Sentiment Analysis,” ACAI 2019: Proceedings of the 2019 2nd International Conference on Algorithms - Computing and Artificial Intelligence, pp. 510-514, 2019
[25] Y. Shi, Y. Tang and W. Long, “Sentiment contagion analysis of interacting investors: Evidence from China’s stock forum,” Physica A: Statistical Mechanics and its Applications, Vol. 523, pp. 246-259, 2019
[26] Y. Sun, M. Fang and X. Wang, “A novel stock recommendation system using Guba sentiment analysis,” Personal and Ubiquitous Computing, Vol. 22, No. 3, 2018
[27] Y. Zhang, Y. Zhang, Y. Jiang and G. Huang, “Multi-feature-Based Subjective-Sentence Classification Method for Chinese Micro-blogs,” Chinese Journal of Electronics, Vol. 26, pp. 1111-1117, 2017
[28] B. Pang and L. Lee, “Opinion mining and sentiment analysis,” Proceedings of the Association for Computational Linguistics, pp. 417-424, 2002
[29] M. John J., “Technical analysis of the financial markets: a comprehensive guide to trading methods and applications,” New York Inst., 1991
[30] 法意研究部,法意教你Y!選股獲利秘技,法意出版,2012
[31] W. J. Welles, “New Concepts in Technical Trading Systems,” Trend Research, 1987
[32] J Bollinger, “Bollinger on Bollinger Bands,” McGraw-Hill Education, 2001
[33] S. Brin and L. Page, “The Anatomy of a Large-Scale Hypertextual Web Search Engine,” Computer Networks and ISDN Systems, 1998
[34] R. Mihalcea and P. Tarau, “TextRank: Bringing Order into Texts,” Association for Computational Linguistics, Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404-411, 2004
[35] T. Mikolov, K. Chen, G. Corrado and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” Proceedings of the International Conference on Learning Representations (ICLR 2013), 2013
[36] S. Hochreiter, and J. Schmidhuber, “Long short-term memory,” Neural Computation, pp. 1735–1780, 1997.
[37] R. Caruana and A. Niculescu-Mizil, “An empirical comparison of supervised learning algorithms,” Proc. 23rd International Conference on Machine Learning. CiteSeerX, 2006.
[38] C. H. Chen, and P. Shih, “A Study on Stock Trend Prediction based on Chinese News and Technical Indicators Using Genetic Algorithms,” IEEE Congress on Evolutionary Computation (CEC), 2019
[39] T. Loughran, and B. Mcdonald. “When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks,” Cognitive Computation, pp.1167 – 1176.
論文使用權限
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2025-07-01公開。
  • 同意授權瀏覽/列印電子全文服務,於2025-08-26起公開。


  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2487 或 來信