系統識別號 | U0002-3006201611353000 |
---|---|
DOI | 10.6846/TKU.2016.01079 |
論文名稱(中文) | 股價序列為基礎的群組股票投資組合探勘之研究 |
論文名稱(英文) | A Study on Mining GSP based on Stock Price Series |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 資訊工程學系資訊網路與通訊碩士班 |
系所名稱(英文) | Master's Program in Networking and Communications, Department of Computer Science and Information En |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 104 |
學期 | 2 |
出版年 | 105 |
研究生(中文) | 余志宏 |
研究生(英文) | Chih-Hung Yu |
學號 | 603420224 |
學位類別 | 碩士 |
語言別 | 英文 |
第二語言別 | |
口試日期 | 2016-06-17 |
論文頁數 | 100頁 |
口試委員 |
指導教授
-
陳俊豪(chchen@mail.tku.edu.tw)
委員 - 洪宗貝 委員 - 許輝煌 |
關鍵字(中) |
資料探勘 群組基因遺傳演算法 時間序列 符號化聚合近似 分段聚合近似 重要感知點 股票投資組合優化 |
關鍵字(英) |
Data mining grouping genetic algorithm time series symbolic aggregate approximation piecewise aggregate approximation perceptually important point stock portfolio optimization |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
由於金融市場的多變性,金融資料探勘一直是個複雜且吸引人的研究主題。以最佳化技術為基礎,至今有不少最佳化方法被提出探勘不同的股票投資組合。其中,有一方法結合群組遺傳演算法(GGA)並考量投資者主觀與客觀的需求,提出一個群組股票投資組合(GSP)探勘技術。群組股票投資組合的優點在於如果投資者不喜歡某一推薦股票,可選擇同一群組內的其它股票取代。然而,該方法在探勘過程並沒有考慮股價序列。因此,為了提高同一群組中的股票相似程度,本論文考量股價序列,提出兩個可以強化群組相似度的群組股票投資組合演算法。 第一個演算法為符號為基礎的群組投資組合探勘技術,包含:SAX-GSP與ESAX-GSP。其染色體編碼包含三個部份,分別為群組,股票和股票投資組合部分。因股價序列的維度很高,所以必須先降低股價序列維度。藉由兩個知名的技術,符號化聚合近似(SAX)和延伸的符號化聚合近似(ESAX),將每一個股價序列轉換為符號序列。由設計出序列距離指標(Series distance factor)用來評估同一群組中的符號序列相似程度,且成為適應度函數的一部分。結合了現有方法,產生兩個適應度函數用來探勘群組股票投資組合。之後,利用遺傳運算產生下一代新的染色體。 然而,當股價序列長度不同時,SAX-GSP與ESAX-GSP皆無法用來探勘群組投資組合。故第二個演算法採用重要感知點(PIP)技術來解決這個問題並命名為PIP-GSP。根據預設的重要感知點數量,所有的股價序列可利用重要感知點技術保留相同的重要感知點個數,由此便可以計算股價序列的相似度。染色體的編碼方式是和第一個演算法相同。接著,演算法使用修改過後的序列距離指標與既有的指標產生了兩個適合度函數探勘群組股票投資組合。 最後,透過真實股票資料,本論文進行一系列的實驗分析展示所提演算法的特點,包含:群組投資組合分析、群組中股價序列分析和所提方法與現有方法的投資報酬比較。 |
英文摘要 |
Stock portfolio mining is always an attractive research topic and a complicated problem due to the quickly economic changing. Based on optimization techniques, many algorithms were proposed to mine different portfolios. An approach was presented to derive group stock portfolio (GSP) considering investors' objectives and subjective requests by grouping genetic algorithms. The benefit is investors can replace any stock which they do not like with substitute stocks in groups. However, that approach did not take stock price series of stocks into consideration to mine GSP. In order to increase the degree of similarity of stocks in the same group, this thesis considers stock price series of stocks and proposes two methods for enhancing the similarity of GSP. In the first algorithm (SAX- and ESAX-GSP), three parts, including grouping, stock and stock portfolio parts, are used to encode a GSP into a chromosome. Because the dimension of the stock price series is high, it should be first reduced. By using the well-known dimension reduction techniques, named symbolic aggregate approximation (SAX) and extended symbolic aggregate approximation (ESAX), each stock price series is transformed to symbolic series. The series distance is then designed to evaluate the similarity of symbolic series in the same group, and used as a part of the fitness function. Combining the previous approach, two fitness functions are utilized to mine a GSP. Genetic operations are then executed to generate new chromosomes for next population. However, when the lengths of stock price series are different, SAX-GSP and ESAX-GSP cannot be used to derive GSP. To solve this problem, by using perceptually important point (PIP), the second algorithm, named PIP-GSP, is proposed to find a solution. According to the given desired number of important points, all series can keep the same important points by PIP, which can then be used to calculate the similarity of series. The chromosome representation is the same as that in the first algorithm. Using the modified series distance factor, the two fitness functions are also used to derive a GSP. At last, experiments were conducted on a real dataset to show the merits of the proposed approaches, including the analysis of the derived GSPs, the analysis of stock price series in groups and comparing the proposed approaches with the previous approach in terms of ROI. |
第三語言摘要 | |
論文目次 |
Contents CHAPTER 1 INTRODUCTION 1 1.1 Problem Definition and Motivation 1 1.2 Contribution 3 1.3 Reader's Guide 4 CHAPTER 2 REVIEW OF RELATED WORK 6 2.1 GGA and Grouping problems 6 2.2 Symbolic Aggregate Approximation 7 2.3 Extended Symbolic Aggregate Approximation 9 2.4 Perceptually Important Point 11 2.5 Stock Portfolio Optimization Approaches 13 CHAPTER 3 THE SAX- & ESAX-BASED APPROACH FOR MINING GSP 15 3.1 Motivation 15 3.2 Components of Proposed Approach 17 3.2.1 Chromosome Representation 17 3.2.2 Initial Population 19 3.2.3 Fitness and Selection 22 3.2.4 Genetic Operations 28 3.2.4.1 Two-Phase Crossover 28 3.2.4.2 Two-Phase Mutation and Inversion 30 3.3 Proposed Algorithm 31 3.4 An Example 35 CHAPTER 4 THE PIP-BASED APPROACH FOR MINING GSP 44 4.1 Motivation 44 4.2 Components of Proposed Approach 45 4.2.1 Chromosome Representation 45 4.2.2 Initial Population 47 4.2.3 Fitness and Selection 50 4.2.4 Genetic Operations 55 4.2.4.1 Two-Phase Crossover 55 4.2.4.2 Two-Phase Mutation and Inversion 57 4.3 Proposed Algorithm 58 4.4 An Example 61 CHAPTER 5 EXPERIMENTAL RESULTS 70 5.1 Experimental Results for Approach (I) 70 5.1.1 Data descriptions 71 5.1.2 The analysis of the derived group stock portfolios 72 5.1.3 Comparison results of the proposed and previous approaches in terms of ROI 79 5.2 Experimental Results for Approach (II) 82 5.2.1 Data descriptions 83 5.2.2 The analysis of the derived group stock portfolios 84 5.2.3 Comparison results of the proposed and previous approaches in terms of ROI 94 CHAPTER 6 CONCLUSIONS AND FUTURE WORK 97 References 99 List of Figures Figure 1: Flowchart of SAX 8 Figure 2: Flowchart of ESAX 10 Figure 3: An example for extracting important points by PIP. 12 Figure 4: The concept of calculating vertical distance by PIP. 12 Figure 5. The stock price series of groups in derived GSP by previous approach 16 Figure 6. The idea grouping result 17 Figure 7: Representation of chromosome Cq. 18 Figure 8: Example of chromosome C1. 19 Figure 9. Two time series with different length. 44 Figure 10: Representation of chromosome Cq. 46 Figure 11: Example of chromosome C1. 47 Figure 12: The stock price series of the dataset 71 Figure 13: The stock price series of groups by the previous approach. 76 Figure 14: The stock price series of groups by the proposed approach using f1. 77 Figure 15: The stock price series of groups by the proposed approach using f2. 78 Figure 16: The stock price series of groups by the Previous Approach (O) 89 Figure 17: The stock price series of groups by the Proposed Approach (f1). 90 Figure 18: The stock price series of groups by the Proposed Approach (f2). 92 List of Tables Table 1: Cash dividend yields of two companies 20 Table 2: Cash dividend (yi) of each stock 21 Table 3: Proportion of average cash dividend of each group to all groups 21 Table 4: Stocks used in this example 35 Table 5: The stock price series of 12 stocks 36 Table 6: The cash dividend of 12 stocks from 2011 to 2013. 36 Table 7: The normalized series of all stocks 38 Table 8: The twelve symbolic series of stocks 39 Table 9: The portfolio satisfactions of all chromosomes 40 Table 10: The group balances of all chromosomes 41 Table 11: Series distance of all chromosomes 41 Table 12: The fitness values of all chromosomes 42 Table 13: Cash dividend yields of two companies 48 Table 14: Cash dividend (yi) of each stock 49 Table 15: Proportion of average cash dividend of each group to all groups 49 Table 16: Stocks used in this example 62 Table 17: The cash dividend of 12 companies on 2013, 2012&2011 62 Table 18: The ten feature points from twelve stock price series 64 Table 19: The portfolio satisfactions of all chromosomes 66 Table 20: The group balances of all chromosomes 66 Table 21: The series distances of all chromosomes 67 Table 22: The fitness values of chromosomes 67 Table 23: Comparison of group stock portfolios by the Proposed Approach (f1) with the SAX 72 Table 24: Comparison of group stock portfolios by the Proposed Approach (f1) with the ESAX 73 Table 25: Comparison of group stock portfolios by the Proposed Approach (f2) with the SAX 74 Table 26: Comparison of group stock portfolios by the Proposed Approach (f2) with the ESAX 74 Table 27: The SAX and ESAX distances of the derived GSPs of previous and proposed approaches 79 Tables 28: The average returns of the proposed and previous approaches on one-year training and testing datasets 80 Tables 29: The average returns of the proposed and previous approaches on two-year training and one-year testing datasets 81 Table 30. The statistic results of the three datasets 83 Table 31: Comparison of group stock portfolios by the Proposed Approach (f1) on the first dataset (10 stocks with different lengths) 84 Table 32: Comparison of group stock portfolios by the Proposed Approach (f1) on the second dataset (20 stocks with different lengths) 85 Table 33: Comparison of group stock portfolios by the Proposed Approach (f1) on the third dataset (31 stocks with different lengths) 85 Table 34: Comparison of group stock portfolios by the Proposed Approach (f2) on the first dataset (10 stocks with different lengths) 86 Table 35: Comparison of group stock portfolios by the Proposed Approach (f2) on the second dataset (20 stocks with different lengths) 87 Table 36: Comparison of group stock portfolios by the Proposed Approach (f2) on the third dataset (31 stocks with different lengths) 87 Table 37: The series distances of the derived GSPs by the previous and proposed approaches on the three datasets 93 Tables 38: The average returns of the proposed and previous approaches on two-year training and testing datasets 94 Tables 39: The average returns of the proposed and previous approaches on two-year training and one-year testing datasets 95 |
參考文獻 |
[1] P. M. Barnaghi, A. A. Bakar and Z. A. Othman, "Enhanced symbolic aggregate approximation method for financial time series data representation," International Conference on New Trends in Information Science and Service Science and Data Mining, pp. 790 - 795, 2012. [2] V. Bevilacqua, V. Pacelli and S. Saladino, "A novel multi objective genetic algorithm for the portfolio optimization," Advanced Intelligent Computing, pp. 186-193, 2012. [3] C. H. Chen, Y. C. Hsieh and Y. C. Lee, "The YTM-based stock portfolio mining approach by genetic algorithm," The IEEE International Conference on Granular Computing, 2013. [4] C. H. Chen, C. B. Lin and C. C. Chen, "Mining group stock portfolio by using grouping genetic algorithms," The IEEE Congress on Evolutionary Computation, 2015. [5] F. L. Chung, T.C. Fu, R. Luk, and V. Ng, “Flexible time series pattern matching based on perceptually important points” International Joint Conference on Artificial Intelligence Workshop on Learning from Temporal and Spatial Data, 2001, pp. 1-7. [6] T. J. Chang, S. C. Yang and K. J. Chang, "Portfolio optimization problems in different risk measures using genetic algorithm," Expert Systems with Applications, Vol. 36, pp. 10529-10537, 2009. [7] Z. G. M. Elhachloufi, F. Hamza, "Stocks Portfolio Optimization Using Classification and Genetic Algorithms," Applied Mathematical Sciences, Vol. 6, pp. 4673-4684, 2012. [8] E. Falkenauer, “A New representation and operators for genetic algorithms applied to grouping problems,” Evolutionary Computation, Vol. 2, pp. 123-144, 1994. [9] E. Falkenauer, “A hybrid grouping genetic algorithm for bin packing," Journal of Heuristics, Vol. 2, pp.5-30, 1996. [10] D. E. Goldberg, "Genetic algorithms in search, optimization, and machine learning," Addison Wesley, 1989. [11] J. J. Grefenstette, "Optimization of control parameters for genetic algorithms," IEEE Transactions on System Man, and Cybernetics, Vol. 16, pp. 122-128, 1986.. [12] J. Gottschlich and O. Hinz, "A decision support system for stock investment recommendations using collective wisdom," Decision Support Systems, Vol. 59, pp. 52-62, 2014. [13] P. Gupta, M. K. Mehlawat and G. Mittal, "Asset portfolio optimization using support vector machines and real-coded genetic algorithm," Journal of Global Optimization, Vol. 53, pp. 297-315, 2012 [14] J. H. Holland, "Adaptation in natural and artificial systems," University of Michigan Press, 1975. [15] P F. Hassanzadeh, M. Collan and M. Modarres, "A practical approach to R&D portfolio selection using the fuzzy pay-off method," IEEE Transactions on Fuzzy Systems, Vol. 20, pp. 615-622, 2012. [16] L. R. Z. Hoklie, "Resolving multi objective stock portfolio optimization problem using genetic algorithm," International Conference on Computer and Automation Engineering, pp. 40-44, 2010. [17] R. Kumar and S. Bhattacharya, "Cooperative search using agents for cardinality constrained portfolio selection problem," IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, Vol. 42, pp. 1510 - 1518, 2012. [18] P. C. Lin, "Portfolio optimization and risk measurement based on non-dominated sorting genetic algorithm," Journal of Industrial and Management Optimization, Vol. 8, pp. 549-564, 2012. [19] J. Lin, E. Keogh, S. Lonardi and B. Chiu, "A symbolic representation of time series, with implication for streaming algorithms," The 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 2-11, 2003. [20] B. Lkhagva, Y. Suzuki, K. Kawagoe, "Extended SAX: Extension of symbolic aggregate approximation for financial time series data representation," Proceeding of IEICE the 17th Data Engineering Workshop, pp. 1-6, 2016. [21] H. M. Markowitz, "Harry Markowitz: Selected Works," World Scientific Publishing Company, 2009. [22] R. Mansini and M. G. Speranza, "Heuristic algorithms for the portfolio selection problem with minimum transaction lots," European Journal of Operational Research, Vol. 114, pp. 219-233, 1999. [23] T. P. Patalia and G. Kulkarni, "Design of genetic algorithm for knapsack problem to perform stock portfolio selection using financial indicators," International Conference on Computational Intelligence and Communication Networks, pp. 289-292, 2011. [24] A. D. Roy, "Safety first and the holding of assets," Econometrica, Vol. 20, pp. 431-449, Jul. 1952. [25] P. Senin and S. Malinchik, "SAX-VSM: Interpretable time series classification using SAX and vector space model," IEEE International Conference on Data Mining, pp. 1175 - 1180, 2013. [26] E. Wah, Y. Mei and B. W. Wah, "Portfolio optimization through data conditioning and aggregation," IEEE International Conference on Tools with Artificial Intelligence, pp. 253-260, 2011 |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信