系統識別號 | U0002-0209202410002100 |
---|---|
DOI | 10.6846/tku202400736 |
論文名稱(中文) | 使用上線學習並結合雙樹堆和乘法權重更新演算法實現動態搜尋最佳化 |
論文名稱(英文) | Dynamic Search Optimization: Double Treaps and MWU Algorithm for Efficient Online Learning |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 資訊工程學系碩士班 |
系所名稱(英文) | Department of Computer Science and Information Engineering |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 112 |
學期 | 2 |
出版年 | 113 |
研究生(中文) | 盧禹丞 |
研究生(英文) | Yu-Cheng Lu |
學號 | 612410125 |
學位類別 | 碩士 |
語言別 | 英文 |
第二語言別 | |
口試日期 | 2024-07-18 |
論文頁數 | 25頁 |
口試委員 |
指導教授
-
林莊傑(josephcclin@mail.ntou.edu.tw)
口試委員 - 吳孟倫 口試委員 - 陳柏安 |
關鍵字(中) |
演算法 資料結構 樹堆 乘法權重更新演算法 上線學習 |
關鍵字(英) |
Algorithm Data Structure Treap MWU Algorithm Online Learning |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
隨著數位資料的快速增長,搜尋演算法的效率與可擴展性變得愈發重要。傳統方法經常難以在速度與資源效率之間取得平衡,尤其是在包含不同頻率元素的大型且複雜的數據集上。本文提出了一種新的方法,通過結合隨機化與上線學習樹堆結構以及乘法權重更新(MWU)演算法來優化文字搜尋效能。其主要目標是在不論單詞熱門程度的情況下,提升搜尋速度並減少資源消耗。我們的方法利用乘法權重更新演算法在學習樹堆和隨機樹堆之間做動態選擇,確保搜尋效率。實驗結果顯示,與其他方法相比,我們的方法顯著縮短了搜尋時間。此方法能有效適應單詞頻率的變化,對於大規模數據集具有較強的適應性。總之,結合隨機化與學習樹堆以及乘法權重更新演算法,為各種應用中的搜尋效能優化提供了一個具有潛力的解決方案。 |
英文摘要 |
The rapid growth of digital data necessitates efficient and scalable search algorithms. Traditional methods often struggle with balancing speed and resource efficiency, particularly in large and complex datasets with varying element frequencies. This paper proposes a new approach to optimize word search performance by integrating randomized and learned treaps with the Multiplicative Weights Update (MWU) algorithm. The primary objective is to enhance the speed of word searches while minimizing resource usage, regardless of the words' popularity or obscurity. Our method leverages the MWU algorithm to dynamically select between the learned treap and the randomized treap, ensuring efficient searches. Experimental results demonstrate that our approach significantly reduces search times compared to other methods. The proposed method effectively adapts to the frequency of words, making it robust for diverse and large-scale datasets. In conclusion, integrating randomized and learned treaps with the MWU algorithm offers a promising solution for optimizing search performance in various applications, paving the way for future enhancements in search algorithms. |
第三語言摘要 | |
論文目次 |
Contents 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Preliminaries 4 2.1 Treap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Multiplicative Weights Update (MWU) Algorithm . . . . . . . . . . . 4 2.3 Red-Black Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.4 AVL Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.5 B-Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.6 Splay Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.7 Learning Augmented . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.7.1 Learning-Augmented Binary Search Tree . . . . . . . . . . . . 8 2.7.2 Learning-Augmented k-means Clustering . . . . . . . . . . . . 8 2.7.3 Learning-Augmented Data Stream Algorithms . . . . . . . . . 8 3 Methodology 9 3.1 Na¨ıve Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1.1 First Attempt . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1.2 Second Attempt . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2.1 The Entire Process . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2.2 Search, Insert and Update Process . . . . . . . . . . . . . . . 11 V 3.3 Treaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3.1 Random Treap . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3.2 Learned Treap . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.4 Multiplicative Weights Update (MWU) Algorithm . . . . . . . . . . . 12 4 Experiment 17 4.1 Dataset and Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . 17 4.1.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.3 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5 Conclusion 23 References 24 VI List of Figures 3.1 The flowchart of the entire algorithm . . . . . . . . . . . . . . . . . . 15 3.2 The flowchart of the search, insert, and update process . . . . . . . . 16 4.1 Data preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2 Mix: average comparison times with randomly chosen words . . . . . 20 4.3 High: average comparison times with high-frequency words . . . . . . 20 4.4 Low: average comparison times with low-frequency words . . . . . . . 21 4.5 Adversarial: average comparison times with adversarial case . . . . . 21 |
參考文獻 |
References [1] Treap (a randomized binary search tree). https://www.geeksforgeeks.org/treapa-randomized-binary-search-tree/, 2022. [2] Implementation of searching, inserting and deleting in treap. https://www.geeksforgeeks.org/implementation-of-search-insert-and-delete-intreap/, 2023. [3] Insertion in an avl tree. https://www.geeksforgeeks.org/insertion-in-an-avltree/, 2023. [4] Introduction of b-tree. https://www.geeksforgeeks.org/introduction-of-b-tree2/, 2023. [5] Insertion in splay tree. https://www.geeksforgeeks.org/insertion-in-splay-tree/, 2024. [6] Introduction to red-black tree. https://www.geeksforgeeks.org/introduction-tored-black-tree/, 2024. [7] Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of computing, 8(1):121– 164, 2012. [8] Erik D Demaine, John Iacono, Grigorios Koumoutsos, and Stefan Langerman. Belga b-trees. Theory of Computing Systems, 65:541–558, 2021. [9] Jon C Ergun, Zhili Feng, Sandeep Silwal, David P Woodruff, and Samson Zhou. Learning-augmented k-means clustering. arXiv preprint arXiv:2110.14094, 2021. 24 [10] Leo J Guibas and Robert Sedgewick. A dichromatic framework for balanced trees. In 19th Annual Symposium on Foundations of Computer Science (sfcs 1978), pages 8–21. IEEE, 1978. [11] Elad Hazan et al. Introduction to online convex optimization. Foundations and Trends® in Optimization, 2(3-4):157–325, 2016. [12] David P Helmbold, Robert E Schapire, Yoram Singer, and Manfred K Warmuth. On-line portfolio selection using multiplicative updates. Mathematical Finance, 8(4):325–347, 1998. [13] Chen-Yu Hsu, Piotr Indyk, Dina Katabi, and Ali Vakilian. Learning-based frequency estimation algorithms. In International Conference on Learning Representations, 2019. [14] Honghao Lin, Tian Luo, and David Woodruff. Learning augmented binary search trees. In International Conference on Machine Learning, pages 13431– 13440. PMLR, 2022. [15] Francesco Orabona. A modern introduction to online learning. arXiv preprint arXiv:1912.13213, 2019. [16] Rahul Shah, Cheng Sheng, Sharma Thankachan, and Jeffrey Vitter. Ranked document retrieval in external memory. ACM Trans. Algorithms, 19(1), mar 2023. |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信