§ 瀏覽學位論文書目資料
  
系統識別號 U0002-3006201611353000
DOI 10.6846/TKU.2016.01079
論文名稱(中文) 股價序列為基礎的群組股票投資組合探勘之研究
論文名稱(英文) A Study on Mining GSP based on Stock Price Series
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系資訊網路與通訊碩士班
系所名稱(英文) Master's Program in Networking and Communications, Department of Computer Science and Information En
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 104
學期 2
出版年 105
研究生(中文) 余志宏
研究生(英文) Chih-Hung Yu
學號 603420224
學位類別 碩士
語言別 英文
第二語言別
口試日期 2016-06-17
論文頁數 100頁
口試委員 指導教授 - 陳俊豪(chchen@mail.tku.edu.tw)
委員 - 洪宗貝
委員 - 許輝煌
關鍵字(中) 資料探勘
群組基因遺傳演算法
時間序列
符號化聚合近似
分段聚合近似
重要感知點
股票投資組合優化
關鍵字(英) Data mining
grouping genetic algorithm
time series
symbolic aggregate approximation
piecewise aggregate approximation
perceptually important point
stock portfolio optimization
第三語言關鍵字
學科別分類
中文摘要
由於金融市場的多變性,金融資料探勘一直是個複雜且吸引人的研究主題。以最佳化技術為基礎,至今有不少最佳化方法被提出探勘不同的股票投資組合。其中,有一方法結合群組遺傳演算法(GGA)並考量投資者主觀與客觀的需求,提出一個群組股票投資組合(GSP)探勘技術。群組股票投資組合的優點在於如果投資者不喜歡某一推薦股票,可選擇同一群組內的其它股票取代。然而,該方法在探勘過程並沒有考慮股價序列。因此,為了提高同一群組中的股票相似程度,本論文考量股價序列,提出兩個可以強化群組相似度的群組股票投資組合演算法。
第一個演算法為符號為基礎的群組投資組合探勘技術,包含:SAX-GSP與ESAX-GSP。其染色體編碼包含三個部份,分別為群組,股票和股票投資組合部分。因股價序列的維度很高,所以必須先降低股價序列維度。藉由兩個知名的技術,符號化聚合近似(SAX)和延伸的符號化聚合近似(ESAX),將每一個股價序列轉換為符號序列。由設計出序列距離指標(Series distance factor)用來評估同一群組中的符號序列相似程度,且成為適應度函數的一部分。結合了現有方法,產生兩個適應度函數用來探勘群組股票投資組合。之後,利用遺傳運算產生下一代新的染色體。
然而,當股價序列長度不同時,SAX-GSP與ESAX-GSP皆無法用來探勘群組投資組合。故第二個演算法採用重要感知點(PIP)技術來解決這個問題並命名為PIP-GSP。根據預設的重要感知點數量,所有的股價序列可利用重要感知點技術保留相同的重要感知點個數,由此便可以計算股價序列的相似度。染色體的編碼方式是和第一個演算法相同。接著,演算法使用修改過後的序列距離指標與既有的指標產生了兩個適合度函數探勘群組股票投資組合。
最後,透過真實股票資料,本論文進行一系列的實驗分析展示所提演算法的特點,包含:群組投資組合分析、群組中股價序列分析和所提方法與現有方法的投資報酬比較。
英文摘要
Stock portfolio mining is always an attractive research topic and a complicated problem due to the quickly economic changing. Based on optimization techniques, many algorithms were proposed to mine different portfolios. An approach was presented to derive group stock portfolio (GSP) considering investors' objectives and subjective requests by grouping genetic algorithms. The benefit is investors can replace any stock which they do not like with substitute stocks in groups. However, that approach did not take stock price series of stocks into consideration to mine GSP. In order to increase the degree of similarity of stocks in the same group, this thesis considers stock price series of stocks and proposes two methods for enhancing the similarity of GSP.
In the first algorithm (SAX- and ESAX-GSP), three parts, including grouping, stock and stock portfolio parts, are used to encode a GSP into a chromosome. Because the dimension of the stock price series is high, it should be first reduced. By using the well-known dimension reduction techniques, named symbolic aggregate approximation (SAX) and extended symbolic aggregate approximation (ESAX), each stock price series is transformed to symbolic series. The series distance is then designed to evaluate the similarity of symbolic series in the same group, and used as a part of the fitness function. Combining the previous approach, two fitness functions are utilized to mine a GSP. Genetic operations are then executed to generate new chromosomes for next population.
However, when the lengths of stock price series are different, SAX-GSP and ESAX-GSP cannot be used to derive GSP. To solve this problem, by using perceptually important point (PIP), the second algorithm, named PIP-GSP, is proposed to find a solution. According to the given desired number of important points, all series can keep the same important points by PIP, which can then be used to calculate the similarity of series. The chromosome representation is the same as that in the first algorithm. Using the modified series distance factor, the two fitness functions are also used to derive a GSP. 
At last, experiments were conducted on a real dataset to show the merits of the proposed approaches, including the analysis of the derived GSPs, the analysis of stock price series in groups and comparing the proposed approaches with the previous approach in terms of ROI.
第三語言摘要
論文目次
Contents
CHAPTER 1	INTRODUCTION	1
1.1	Problem Definition and Motivation	1
1.2	Contribution	3
1.3	Reader's Guide	4
CHAPTER 2	REVIEW OF RELATED WORK	6
2.1	GGA and Grouping problems	6
2.2	Symbolic Aggregate Approximation	7
2.3	Extended Symbolic Aggregate Approximation	9
2.4	Perceptually Important Point	11
2.5	Stock Portfolio Optimization Approaches	13
CHAPTER 3	THE SAX- & ESAX-BASED APPROACH FOR MINING GSP	15
3.1	Motivation	15
3.2	Components of Proposed Approach	17
3.2.1	Chromosome Representation	17
3.2.2	Initial Population	19
3.2.3	Fitness and Selection	22
3.2.4	Genetic Operations	28
3.2.4.1	Two-Phase Crossover	28
3.2.4.2	Two-Phase Mutation and Inversion	30
3.3	Proposed Algorithm	31
3.4	An Example	35
CHAPTER 4	THE PIP-BASED APPROACH FOR MINING GSP	44
4.1	Motivation	44
4.2	Components of Proposed Approach	45
4.2.1	Chromosome Representation	45
4.2.2	Initial Population	47
4.2.3	Fitness and Selection	50
4.2.4	Genetic Operations	55
4.2.4.1	Two-Phase Crossover	55
4.2.4.2	Two-Phase Mutation and Inversion	57
4.3	Proposed Algorithm	58
4.4	An Example	61
CHAPTER 5	EXPERIMENTAL RESULTS	70
5.1	Experimental Results for Approach (I)	70
5.1.1	Data descriptions	71
5.1.2	The analysis of the derived group stock portfolios	72
5.1.3	Comparison results of the proposed and previous approaches in terms of ROI	79
5.2	Experimental Results for Approach (II)	82
5.2.1	Data descriptions	83
5.2.2	The analysis of the derived group stock portfolios	84
5.2.3	Comparison results of the proposed and previous approaches in terms of ROI	94
CHAPTER 6	CONCLUSIONS AND FUTURE WORK	97
References	99
 
List of Figures
Figure 1: Flowchart of SAX	8
Figure 2: Flowchart of ESAX	10
Figure 3: An example for extracting important points by PIP.	12
Figure 4: The concept of calculating vertical distance by PIP.	12
Figure 5. The stock price series of groups in derived GSP by previous approach	16
Figure 6. The idea grouping result	17
Figure 7: Representation of chromosome Cq.	18
Figure 8: Example of chromosome C1.	19
Figure 9. Two time series with different length.	44
Figure 10: Representation of chromosome Cq.	46
Figure 11: Example of chromosome C1.	47
Figure 12: The stock price series of the dataset	71
Figure 13: The stock price series of groups by the previous approach.	76
Figure 14: The stock price series of groups by the proposed approach using f1.	77
Figure 15: The stock price series of groups by the proposed approach using f2.	78
Figure 16: The stock price series of groups by the Previous Approach (O)	89
Figure 17: The stock price series of groups by the Proposed Approach (f1).	90
Figure 18: The stock price series of groups by the Proposed Approach (f2).	92
 
List of Tables
Table 1: Cash dividend yields of two companies	20
Table 2: Cash dividend (yi) of each stock	21
Table 3: Proportion of average cash dividend of each group to all groups	21
Table 4: Stocks used in this example	35
Table 5: The stock price series of 12 stocks	36
Table 6: The cash dividend of 12 stocks from 2011 to 2013.	36
Table 7: The normalized series of all stocks	38
Table 8: The twelve symbolic series of stocks	39
Table 9: The portfolio satisfactions of all chromosomes	40
Table 10: The group balances of all chromosomes	41
Table 11: Series distance of all chromosomes	41
Table 12: The fitness values of all chromosomes	42
Table 13: Cash dividend yields of two companies	48
Table 14: Cash dividend (yi) of each stock	49
Table 15: Proportion of average cash dividend of each group to all groups	49
Table 16: Stocks used in this example	62
Table 17: The cash dividend of 12 companies on 2013, 2012&2011	62
Table 18: The ten feature points from twelve stock price series	64
Table 19: The portfolio satisfactions of all chromosomes	66
Table 20: The group balances of all chromosomes	66
Table 21: The series distances of all chromosomes	67
Table 22: The fitness values of chromosomes	67
Table 23: Comparison of group stock portfolios by the Proposed Approach (f1) with the SAX	72
Table 24: Comparison of group stock portfolios by the Proposed Approach (f1) with the ESAX	73
Table 25: Comparison of group stock portfolios by the Proposed Approach (f2) with the SAX	74
Table 26: Comparison of group stock portfolios by the Proposed Approach (f2) with the ESAX	74
Table 27: The SAX and ESAX distances of the derived GSPs of previous and proposed approaches	79
Tables 28: The average returns of the proposed and previous approaches on one-year training and testing datasets	80
Tables 29: The average returns of the proposed and previous approaches on two-year training and one-year testing datasets	81
Table 30. The statistic results of the three datasets	83
Table 31: Comparison of group stock portfolios by the Proposed Approach (f1) on the first dataset (10 stocks with different lengths)	84
Table 32: Comparison of group stock portfolios by the Proposed Approach (f1) on the second dataset (20 stocks with different lengths)	85
Table 33: Comparison of group stock portfolios by the Proposed Approach (f1) on the third dataset (31 stocks with different lengths)	85
Table 34: Comparison of group stock portfolios by the Proposed Approach (f2) on the first dataset (10 stocks with different lengths)	86
Table 35: Comparison of group stock portfolios by the Proposed Approach (f2) on the second dataset (20 stocks with different lengths)	87
Table 36: Comparison of group stock portfolios by the Proposed Approach (f2) on the third dataset (31 stocks with different lengths)	87
Table 37: The series distances of the derived GSPs by the previous and proposed approaches on the three datasets	93
Tables 38: The average returns of the proposed and previous approaches on two-year training and testing datasets	94
Tables 39: The average returns of the proposed and previous approaches on two-year training and one-year testing datasets	95
參考文獻
[1]	P. M. Barnaghi, A. A. Bakar and Z. A. Othman, "Enhanced symbolic aggregate approximation method for financial time series data representation," International Conference on New Trends in Information Science and Service Science and Data Mining, pp. 790 - 795, 2012.
[2]	V. Bevilacqua, V. Pacelli and S. Saladino, "A novel multi objective genetic algorithm for the portfolio optimization," Advanced Intelligent Computing, pp. 186-193, 2012.
[3]	C. H. Chen, Y. C. Hsieh and Y. C. Lee, "The YTM-based stock portfolio mining approach by genetic algorithm," The IEEE International Conference on Granular Computing, 2013.
[4]	C. H. Chen, C. B. Lin and C. C. Chen, "Mining group stock portfolio by using grouping genetic algorithms," The IEEE Congress on Evolutionary Computation, 2015.
[5]	F. L. Chung, T.C. Fu, R. Luk, and V. Ng, “Flexible time series pattern matching based on perceptually important points” International Joint Conference on Artificial Intelligence Workshop on Learning from Temporal and Spatial Data, 2001, pp. 1-7.
[6]	T. J. Chang, S. C. Yang and K. J. Chang, "Portfolio optimization problems in different risk measures using genetic algorithm," Expert Systems with Applications, Vol. 36, pp. 10529-10537, 2009.
[7]	Z. G. M. Elhachloufi, F. Hamza, "Stocks Portfolio Optimization Using Classification and Genetic Algorithms," Applied Mathematical Sciences, Vol. 6, pp. 4673-4684, 2012.
[8]	E. Falkenauer, “A New representation and operators for genetic algorithms applied to grouping problems,” Evolutionary Computation, Vol. 2, pp. 123-144, 1994.
[9]	E. Falkenauer, “A hybrid grouping genetic algorithm for bin packing," Journal of Heuristics, Vol. 2, pp.5-30, 1996.
[10]	D. E. Goldberg, "Genetic algorithms in search, optimization, and machine learning," Addison Wesley, 1989.
[11]	J. J. Grefenstette, "Optimization of control parameters for genetic algorithms," IEEE Transactions on System Man, and Cybernetics, Vol. 16, pp. 122-128, 1986..
[12]	J. Gottschlich and O. Hinz, "A decision support system for stock investment recommendations using collective wisdom," Decision Support Systems, Vol. 59, pp. 52-62, 2014.
[13]	P. Gupta, M. K. Mehlawat and G. Mittal, "Asset portfolio optimization using support vector machines and real-coded genetic algorithm," Journal of Global Optimization, Vol. 53, pp. 297-315, 2012
[14]	J. H. Holland, "Adaptation in natural and artificial systems," University of Michigan Press, 1975.
[15]	P F. Hassanzadeh, M. Collan and M. Modarres, "A practical approach to R&D portfolio selection using the fuzzy pay-off method," IEEE Transactions on Fuzzy Systems, Vol. 20, pp. 615-622, 2012.
[16]	L. R. Z. Hoklie, "Resolving multi objective stock portfolio optimization problem using genetic algorithm," International Conference on Computer and Automation Engineering, pp. 40-44, 2010.
[17]	R. Kumar and S. Bhattacharya, "Cooperative search using agents for cardinality constrained portfolio selection problem," IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, Vol. 42, pp. 1510 - 1518, 2012.
[18]	P. C. Lin, "Portfolio optimization and risk measurement based on non-dominated sorting genetic algorithm," Journal of Industrial and Management Optimization, Vol. 8, pp. 549-564, 2012.
[19]	J. Lin, E. Keogh, S. Lonardi and B. Chiu, "A symbolic representation of time series, with implication for streaming algorithms," The 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 2-11, 2003.
[20]	B. Lkhagva, Y. Suzuki, K. Kawagoe, "Extended SAX: Extension of symbolic aggregate approximation for financial time series data representation," Proceeding of IEICE the 17th Data Engineering Workshop, pp. 1-6, 2016.
[21]	H. M. Markowitz, "Harry Markowitz: Selected Works," World Scientific Publishing Company, 2009.
[22]	R. Mansini and M. G. Speranza, "Heuristic algorithms for the portfolio selection problem with minimum transaction lots," European Journal of Operational Research, Vol. 114, pp. 219-233, 1999.
[23]	T. P. Patalia and G. Kulkarni, "Design of genetic algorithm for knapsack problem to perform stock portfolio selection using financial indicators," International Conference on Computational Intelligence and Communication Networks, pp. 289-292, 2011.
[24]	A. D. Roy, "Safety first and the holding of assets," Econometrica, Vol. 20, pp. 431-449, Jul. 1952.
[25]	P. Senin and S. Malinchik, "SAX-VSM: Interpretable time series classification using SAX and vector space model," IEEE International Conference on Data Mining, pp. 1175 - 1180, 2013.
[26]	E. Wah, Y. Mei and B. W. Wah, "Portfolio optimization through data conditioning and aggregation," IEEE International Conference on Tools with Artificial Intelligence, pp. 253-260, 2011
論文全文使用權限
校內
紙本論文於授權書繳交後5年公開
同意電子論文全文授權校園內公開
校內電子論文於授權書繳交後5年公開
校外
同意授權
校外電子論文於授權書繳交後5年公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信