§ 瀏覽學位論文書目資料
系統識別號 U0002-0608202501311900
DOI 10.6846/tku202500672
論文名稱(中文) 虛擬兒女聊天系統
論文名稱(英文) Virtual Child Conversational System for Elderly Companionship
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系全英語碩士班
系所名稱(英文) Master's Program, Department of Computer Science and Information Engineering (English-taught program)
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 113
學期 2
出版年 114
研究生(中文) 蔡馥璟
研究生(英文) Fu-Ching Tsai
學號 612780063
學位類別 碩士
語言別 英文
第二語言別
口試日期 2025-06-28
論文頁數 109頁
口試委員 指導教授 - 張志勇(cychang@mail.tku.edu.tw)
口試委員 - 張義雄
口試委員 - 張榮貴
關鍵字(中) 引導式對話
話題來源選擇
偏離度控制
增強式學習
槽位填充
Dueling DQN
關鍵字(英) Guided Conversation
Topic Source Selection
Deviation Degree Control
Reinforcement Learning
Slot Filling
Dueling DQN
第三語言關鍵字
學科別分類
中文摘要
隨著台灣步入超高齡社會,如何有效且溫和地蒐集獨居長者的健康資訊成為迫切議題。傳統封閉式提問常引發防衛心理,亦缺乏對語境偏離的容錯能力。為此,本研究提出一套名為「虛擬兒女聊天系統」之創新對話架構,透過引導式對話、偏離度建模與增強學習決策等模組,提升對話自然度與健康資料蒐集效率。
系統設計分為四大核心模組:(1) 引導式聊天劇本:根據背景、作息與興趣三種話題來源,結合目標槽位(用藥、飲食、睡眠、活動),生成具情境性的多版本對話劇本;(2) 偏離度控制:建構三維偏離度空間,用以量化話題來源、槽位目標與語義偏離程度,實現dead-end rescue話題補救機制;(3) 雙大型語言模型對弈:使用ChatGPT與Gemini模擬兒女與長者對話,快速產出高擬真訓練資料;(4) 增強式學習決策:以Dueling DQN架構學習對話策略,根據槽位填充狀況與偏離度,選擇最佳劇本以提升多輪任務完成率。
實驗以 MultiWOZ 2.0/2.1、ChatGPT-Gemini 模擬語料與偏題擾動版本進行分析,結果顯示本系統在資訊填槽率、偏離容錯力與互動輪數等指標上皆優於傳統方法與消融版本。綜合而言,本研究在多輪健康對話中有效融合情境適應性與策略彈性,為虛擬陪伴型對話系統於高齡照護場域提供具體實作與發展潛力。
英文摘要
As Taiwan officially enters a super-aged society, collecting health information from elderly individuals—especially those living alone—has become increasingly critical. Traditional closed-ended questioning often induces resistance and lacks the flexibility to handle conversational deviations. This study proposes a novel system named the "Virtual Child Conversational System," designed to improve natural interaction and data collection through guided dialogue, deviation modeling, and reinforcement learning-based decision making.
The system architecture consists of four key components: (1) Guided Conversation Scripts: Twelve scenario-based dialogues generated from three topic sources—background, routines, and interests—combined with four target slots (medication, diet, sleep, and activity); (2) Deviation Control: A 3D deviation space quantifies topic origin, slot type, and deviation level, enabling dead-end rescue strategies; (3) Dual-Large Language Model Simulation: ChatGPT and Gemini simulate multiround elderly-child dialogues, generating diverse and realistic training samples; (4) Reinforcement Learning with Dueling DQN: Slot filling status and deviation degree inform the selection of optimal scripts to accelerate task completion across dialogue turns.
Experiments using MultiWOZ 2.0/2.1, ChatGPT-Gemini simulated datasets, and off-topic variants demonstrate superior performance in slot filling, deviation tolerance, and dialogue efficiency compared to baseline and ablated models. Overall, the proposed system effectively integrates adaptability and strategic flexibility, offering a practical and scalable solution for elderly care through virtual companionship dialogues.
第三語言摘要
論文目次
Table of Contents
Table of Contents VI
List of Figures IX
List of Tables XII
Chapter 1 Introduction 1
Chapter 2 Related Work 14
2.1 Applications of Guided Dialogue Systems 14
2.2 Handling Dialogue Deviation and Topic Recovery Strategies 17
2.3 Reinforcement Learning for Multi-Slot Information Collection 18
2.4 Overview and Comparative Analysis 20
Chapter 3 Background Knowledge 22
3.1 SBERT 23
3.2 Dueling DQN 24
3.3 Wav2Lip 26
3.4 F5TTS 28
3.5 ChatGPT 29
3.6 Gemini 32
Chapter 4 System Design 35
4.1 Overall System Architecture 35
4.2 Data Collection and Preprocessing 39
4.2.1 Children's Voice and Visual Data 39
4.2.2 Elder Information 41
4.3 Script Design and Generation for Guided Conversation 42
4.3.1 Target Slots 43
4.3.2 Guided Chat Script Design 44
4.3.3 Content of Guided Chat Scripts 45
4.4 Duel of Dual Large Language Models 46
4.4.1 Role Assignment of Large Language Models 47
4.5 Reinforcement Learning Framework and Reward Design 53
4.5.1 State 54
4.5.2 Action 55
4.5.3 Reward Function 56
4.6 Model Training Process 60
4.6.1 Source of Input Samples 60
4.6.2 Strategy Learning and Loss Function Design 61
4.6.3 Parameter Update and Optimizer 63
4.7 Deployment Phase 63
4.7.1 Virtual Child Video Generation 64
4.7.2 Virtual Child Dialogue System Application 65
Chapter 5 Experimental Analysis 68
5.1 Environment and System Configuration 68
5.2 Datasets 69
5.2.1 Public Dataset 69
5.2.2 Simulated Dialogue Dataset 71
5.3 Experimental Results 72
5.3.1 Analysis of Existing Model Comparisons 73
5.3.2 Module Ablation Study 86
5.3.3 Model Stability and Learning Efficiency 91
Chapter 6 Conclusion 102
6.1 Completed Work of This Study 102
6.2 Future Work 104
References 106
List of Figures
Figure  1. Taiwan Becomes a Super-Aged Society in 2025 1
Figure 2. Top 5 Questions for Elderly People and Young People 2
Figure 3. System Architecture Diagram 7
Figure 4. Research Contributions 10
Figure 5. Architecture of SBERT (Sentence-BERT) Model [29] 24
Figure 6. Reinforcement Learning Framework 25
Figure 7. Architecture of Wav2Lip Model [32] 27
Figure 8. Architecture of F5TTS Model [33] 29
Figure 9. Training Process of GPT [43] 30
Figure 10. Application Scope of GPT Models [43] 31
Figure 11. Gemini’s Long-Context Comprehension Illustration [22] 32
Figure 12. Gemini’s Multimodal Processing Capabilities [22] 33
Figure 13. Overall System Architecture 37
Figure 14. Overall System Architecture 38
Figure 15. Audio Preprocessing Pipeline 40
Figure 16. Multi-Emotion Facial Expression Clips 40
Figure 17. Image Preprocessing Workflow 41
Figure 18. Lip-Syncing Alignment Process 41
Figure 19. Description of Four Target Slots 43
Figure 20. ChatGPT-Based Generation of 12 Script Variants 44
Figure 21. Example of Guided Conversational Script 45
Figure 22. Prompt Engineering for Script Generation 46
Figure 23. Role Allocation: ChatGPT [21], System, and Gemini [22] 47
Figure 24. Script Selection by the System 48
Figure 25. Script and Random Deviation Sent to Gemini [22] 48
Figure 26. Simulated Elderly Response Generated by Gemini [22] 49
Figure 27. Slot Filling Detection by ChatGPT [21] 50
Figure 28. Deviation Scoring by SBERT [29] 50
Figure 29. Low Deviation: Continue Dialogue with Gemini [22] 51
Figure 30. High Deviation: System Requests New Scripts from ChatGPT [21] 52
Figure 31. Continues Dialogue with Gemini [22] Using Regenerated Scripts 52
Figure 32. Example of Dialogue Record with Annotations 53
Figure 33. State Design in Reinforcement Learning Framework 55
Figure 34. Action Design and Script Selection Strategy 56
Figure 35. Input Data Format for Dueling DQN Model [20] 61
Figure 36. Scenario Overview of System Deployment Phase 64
Figure 37. Virtual Child Video Generation Process 65
Figure 38. Functional Modules of the Virtual Child Dialogue System 66
Figure 39. Sample Dialogue Interface on LINE Platform 66
Figure 40. Action Selection via Trained Dueling DQN Policy 67
Figure 41. Response Generation from the Virtual Child 67
Figure 42. Health Report Generation Example 67
Figure 43. Performance heatmap of slot-filling tasks 76
Figure 44. Evaluation line chart on the MultiWOZ 2.0 [36] dataset 80
Figure 45. Evaluation line chart on the MultiWOZ 2.1 [37] dataset 81
Figure 46. Performance on MultiWOZ 2.0 [36] with 30% off-topic utterances 84
Figure 47. Radar chart on the ChatGPT × Gemini dataset 87
Figure 48. Radar chart on the MultiWOZ 2.0 [36] dataset 88
Figure 49. Radar chart on the MultiWOZ 2.1 [37] dataset 89
Figure 50. Different ablated variants on the MultiWOZ 2.0 [36] dataset 91
Figure 51. Different variants on MultiWOZ 2.0 [36] with 30% off-topic utterances 92
Figure 52. Different ablated variants on the ChatGPT × Gemini dataset 93
Figure 53. Training curve of the full model 94
Figure 54. Stability evaluation on the MultiWOZ 2.0 [36] dataset 98
Figure 55. Stability evaluation on the MultiWOZ 2.0 [36] + off-topic dataset 99
Figure 56. Stability evaluation on the ChatGPT × Gemini dataset 100

List of Tables
Table 1. Comparative Summary of Related Work 21
Table 2. Deviation Penalty Matrix 58
Table 3. Experimental Environment of the Proposed System 68
Table 4. Performance evaluation of slot-filling tasks 76
Table 5. Performance evaluation on the MultiWOZ 2.0 [36] dataset 80
Table 6. Performance evaluation on the MultiWOZ 2.1 [37] dataset 81
Table 7. Evaluation on MultiWOZ 2.0 [36] with 30% off-topic utterances 84
Table 8. Ablation study on the ChatGPT × Gemini dataset 87
Table 9. Ablation study on the MultiWOZ 2.0 [36] dataset 88
Table 10. Ablation study on the MultiWOZ 2.1 [37] dataset 89
參考文獻
[1] A. Algherairy and M. Ahmed, “A review of dialogue systems: current trends and future directions,” Neural Computing and Applications, vol. 36, no. 12, pp. 6325–6351, 2024.
[2] W. He et al., “Unified dialog model pre-training for task-oriented dialog understanding and generation,” in Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 187–200, 2022.
[3] W. He et al., “Galaxy: A generative pre-trained model for task-oriented dialog with semi-supervised learning and explicit policy injection,” in Proceedings of the AAAI conference on artificial intelligence, pp. 10749–10757, 2022.
[4] Y. Jang, J. Lee, and K.-E. Kim, “GPT-critic: Offline reinforcement learning for end-to-end task-oriented dialogue systems,” in Proceedings of the International Conference on Learning Representations, 2022.
[5] H. Jeon and G. G. Lee, “Domain state tracking for a simplified dialogue system,” arXiv preprint arXiv:2103.06648, 2021.
[6] Z. Lin, A. Madotto, G. I. Winata, and P. Fung, “Mintl: Minimalist transfer learning for task-oriented dialogue systems,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
[7]A.Alessa and H. Al-Khalifa, “Towards designing a ChatGPT conversational companion for elderly people,” in Proceedings of the 16th international conference on Pervasive technologies related to assistive environments, pp. 667–674, 2023.
[8] N. Gasteiger, K. Loveys, M. Law, and E. Broadbent, “Friends from the future: a scoping review of research into robots and computer agents to combat loneliness in older people,” Clinical interventions in aging, pp. 941–971, 2021.
[9] R. Higashinaka, T. Minato, H. Nishizaki, and T. Nagai, “Proceedings of the Dialogue Robot Competition 2023,” arXiv preprint arXiv:2312.14430, 2023.
[10] K. McNamara and E. Rudy, “COMPANIONSHIP TO ADDRESS QUALITY OF LIFE AND LONELINESS AMONG OLDER ADULTS WITH SEVERE LONELINESS,” Innovation in Aging, vol. 6, no. Suppl 1, p. 714, 2022.
[11] E. Rudy, K. McNamara, R. Patel, and C. Sturm, “A Virtual Companionship Intervention Reduces Loneliness During the COVID-19 Pandemic,” Innovation in Aging, vol. 5, no. Suppl 1, p. 958, 2021.
[12] S. Tokunaga, K. Tamura, and M. Otake-Matsuura, “A dialogue-based system with photo and storytelling for older adults: toward daily cognitive training,” Frontiers in Robotics and AI, vol. 8, p. 644964, 2021.
[13]T. Nishio et al., “The effects of physically embodied multiple conversation robots on the elderly,” Frontiers in Robotics and AI, vol. 8, p. 633045, 2021.
[14]N. Shikha, K. Naidu, A. R. Choudhury, and N. Kayarvizhy, “Smart memory companion for elderly,” in Proceedings of the 2022 4th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N), IEEE, pp. 1497–1502, 2022.
[15] A. Kiran, A. Balaram, P. Parshapu, S. L. Naik, P. Purushotham, and M. Silparaj, “AI-Enhanced Elderly Care Companion,” in Proceedings of the 2024 International Conference on Science Technology Engineering and Management (ICSTEM), IEEE, pp. 1–5, 2024.
[16] N. Matsumoto and K. Ando, “An Active Listening Dialogue Model Focued on ‘Open Questions’ Using Reinforcement Learning,” in Proceedings of the 2024 16th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI), IEEE, pp. 499–504, 2024.
[17] S. Z. Razavi, L. K. Schubert, K. Van Orden, M. R. Ali, B. Kane, and E. Hoque, “Discourse behavior of older adults interacting with a dialogue agent competent in multiple topics,” ACM Transactions on Interactive Intelligent Systems (TiiS), vol. 12, no. 2, pp. 1–21, 2022.
[18] C. Zhai and S. Wibowo, “A WGAN-based dialogue system for embedding humor, empathy, and cultural aspects in education,” IEEE Access, vol. 11, pp. 79706–79717, July 2023
[19] Y. Zhao, M. Dastani, J. Long, Z. Wang, and S. Wang, “Rescue Conversations from Dead-ends: Efficient Exploration for Task-oriented Dialogue Policy Optimization,” Transactions of the Association for Computational Linguistics, vol. 12, pp. 1578–1596, 2024.
[20] M. Sewak, “Deep q network (dqn), double dqn, and dueling dqn: A step towards general artificial intelligence,” in Deep reinforcement learning: frontiers of artificial intelligence, Springer, pp. 95–108, 2019.
[21] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” 2018. 
[22] G. Team et al., “Gemini: a family of highly capable multimodal models,” arXiv preprint arXiv:2312.11805, 2023.
[23] Y. Liu et al., “A review of reinforcement learning for natural language processing and applications in healthcare,” Journal of the American Medical Informatics Association, vol. 31, no. 10, pp. 2379–2393, 2024.
[24]K. Lu, S. Zhang, and X. Chen, “Goal-oriented dialogue policy learning from failures,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2019, pp. 2596–2603.
[25] Y.-C. Wu and C. E. Rasmussen, “Clipping loops for sample-efficient dialogue policy optimisation,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 3420–3428, 2021.
[26] T. H. Bui, M. Rajman, and M. Melichar, “Rapid dialogue prototyping methodology,” in Proceedings of the International Conference on Text, Speech and Dialogue, Springer, pp. 579–586, 2004.
[27] Y. Feng et al., “Fantastic rewards and how to tame them: A case study on reward learning for task-oriented dialogue systems,” in Proceedings of the 11th International Conference on Learning Representations (ICLR), 2024.
[28]H. Du, S. Li, M. Wu, X. Feng, Y.-F. Li, and H. Wang, “Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue,” in Proceedings of the Findings of the Association for Computational Linguistics: Empirical Methods in Natural Language Processing (EMNLP), 2024.
[29] N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 3982–3992, 2019.
[30] C. J. Watkins and P. Dayan, “Q-learning,” Machine learning, vol. 8, no. 3, pp. 279–292, 1992.
[31]V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
[32] K. R. Prajwal, R. Mukhopadhyay, V. P. Namboodiri, and C. V. Jawahar, “A lip sync expert is all you need for speech to lip generation in the wild,” in Proceedings of the 28th ACM international conference on multimedia, pp. 484–492, 2020.
[33] S. E. Eskimez et al., “E2 tts: Embarrassingly easy fully non-autoregressive zero-shot tts,” in Proceedings of the 2024 IEEE Spoken Language Technology Workshop (SLT), IEEE, pp. 682–689, 2024.
[34]A. Vaswani et al., “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017. 
[35] L. Liu et al., “Improving alignment of text-to-image diffusion models with reinforcement learning from human feedback,” in Proceedings of the 31st ACM International Conference on Multimedia (ACM MM), Ottawa, ON, Canada, pp. 222–231, 2023.
[36]P. Budzianowski et al., “Multiwoz--a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018.
[37]M. Eric et al., “MultiWOZ 2.1: A consolidated multi-domain dialogue dataset with state corrections and state tracking baselines,” in Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC 2020), 2020.
[38] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a method for automatic evaluation of machine translation,” in Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp. 311–318, 2002.
[39] E. Hosseini-Asl, B. McCann, C.-S. Wu, S. Yavuz, and R. Socher, “A simple language model for task-oriented dialogue,” Advances in Neural Information Processing Systems, vol. 33, pp. 20179–20191, 2020.
[40] Y. Su et al., “Multi-task pre-training for plug-and-play task-oriented dialogue system,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), 2022.
[41] Y. Yang, Y. Li, and X. Quan, “Ubar: Towards fully end-to-end task-oriented dialog system with gpt-2,” in Proceedings of the AAAI conference on artificial intelligence, pp. 14230–14238, 2021.
[42] H. Touvron et al., “Llama: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023.
[43] G. Yenduri et al., “GPT (Generative Pre-Trained Transformer)—A comprehensive review on enabling technologies, potential applications, emerging challenges, and future directions,” IEEE Access, vol. 12, pp. 438849–438897, Apr. 2024.
[44] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, pp. 4171–4186, 2019.
論文全文使用權限
國家圖書館
同意無償授權國家圖書館,書目與全文電子檔於2030-08-06, 於網際網路公開,延後電子全文
校內
校內紙本論文延後至2030-08-06公開
同意電子論文全文授權於全球公開
校內電子論文延後至2030-08-06公開,延後電子全文
校外
同意授權予資料庫廠商
校外電子論文延後至2030-08-06公開,延後電子全文

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信