Satinder P Singh, Michael J Kearns, Diane J Litman, Marilyn 
A Walker (2000). Reinforcement learning for spoken 
dialogue systems. Neural Information Processing 
Systems, pages 956-962. 
Sarwar, G. Karypis, J. Konstan, J. Riedl (2001), Item-based 
collaborative filtering recommendation algorithms, 10th 
International Conference on World Wide Web, ACM, 
pp. 285-295. 
Sebastian Thrun and Anton Schwartz (1993). Issues in using 
function approximation for reinforcementlearning. 
Connectionist Models Summer School Hillsdale, NJ. 
Lawrence Erlbaum. 
Shambour, J. Lu (2011), A hybrid trust-enhanced 
collaborative filtering recommendation approach for 
personalized government-to-business eservices, 
International Journal of Intelligent Systems, 26 814843. 
Shamim Nemati, Mohammad M Ghassemi, and Gari D Cli 
ord (2016). Optimal medication dosing fromsuboptimal 
clinical examples:A deep reinforcementlearning 
approach. Engineering in Medicine and Biology Society, 
pages 2978-2981. IEEE. 
Shi-Yong Chen, Yang Yu, Qing Da, Jun Tan, Hai-Kuan 
Huang, and Hai-Hong Tang (2018). Stabilizing 
reinforcement learning in dynamic environment with 
application to online recommendation. In Proceedings of 
the 24th ACM SIGKDD International Conference on 
Knowledge Discovery & Data Mining. 
Shuai Zhang, Lina Yao, Aixin Sun, Yi Tay (2019). Deep 
learningbased recommender system :A survey and new 
perspectives. Computing Surveys (CSUR), 52(1):1-38,  
Smyth, P. Cotter (2000), A personalised TV listings service 
for the digital TV age, Knowledge-Based Systems. 
Su Liu, Ye Chen, Hui Huang, Liang Xiao, and Xiaojun Hei 
(2018). Towards smart educational recommendations 
with reinforcement learning in classroom. International 
Conference on Teaching, Assessment, and Learning for 
Engineering pages 1079-1084. IEEE. 
Tariq Mahmood and Francesco Ricci (2007). Learning and 
adaptivity in interactive recommender systems. 
Conference on Electronic commerce, pages 75-84. 
Thomas Degris (2015). Deep reinforcement learning in large 
discrete action spaces. arXiv : 1512.07679. 
Thorsten Bohnenberger and Anthony Jameson (2001). When 
policies are better than plans: Decision theoretic planning 
of recommendation sequences. International Conference 
on intelligent user interfaces, pages 21-24. 
Thorsten Joachims, Dayne Freitag, Tom Mitchell(1997). 
Webwatcher: A tour guide for the world wide web. In 
IJCAI (1), pages 770{777. Citeseer. 
Timothy P Lillicrap, Alexander Pritzel, Jonathan J Hunt, 
Nicolas Heess, Yuval Tassa, Tom Erez, David Silver, 
Daan Wierstra (2015). Continuous control with deep 
reinforcement learning. arXiv. 
Tong Yu, Yilin Shen, Ruiyi Zhang, Xiangyu Zeng, and 
Hongxia Jin (2019). Vision-language recommendation 
via attribute augmented multimodal reinforcement 
learning. ACM International Conference on Multimedia, 
pages 39-47. 
Vladimir Vapnik (2013). The nature of statistical learning 
theory. Springer science & business media. 
Wacharawan Intayoad, Chayapol Kamyod, and Punnarumol 
Temdee (2018). Reinforcement learning for online 
learning recommendation system. In 2018 Global 
Wireless Summit (GWS), pages 167-170. IEEE. 
Yufan Zhao, Donglin Zeng, Mark A Socinski, and Michael 
R Kosorok (2011). Reinforcement learning strategies 
forclinical trials in nonsmall cell lung cancer.  
Nima Taghipour, Ahmad Kardan, Saeed Shiry Ghidary 
(2007). Usage based web recommendations: a 
reinforcement learning approach. In Proceedings of the 
2007 ACM conferenceon Recommender systems. 
Tejas D Kulkarni, Karthik Narasimhan, Ardavan Saeedi, and 
Josh Tenenbaum (2016). Hierarchical deep 
reinforcement learning: Integrating temporal abstraction 
and intrinsic motivation. Neural information processing 
systems, pages 3675-3683. 
Long-Ji Lin (1992). Self-improving reactive agents based on 
reinforcement learning, planning and teaching. Machine 
learning, 8(3-4):293-321.  
Yikun Xian, Zuohui Fu, S Muthukrishnan, Gerard De Melo, 
Yongfeng Zhang (2019). Reinforcement knowledge 
graph reasoning for explainable recommendation. ACM 
SIGIR Conference on Research and Development in 
Information Retrieval, pages 285-294. 
Wilson, B. Smyth, D. O’Sullivan (2003), Sparsity reduction 
in collaborative recommendation: A case-based 
approach, Journal of Pattern Recognition andArtificial 
Intelligence, 17863-884. 
Xiangyu Zhao, Long Xia, Dawei Yin, and Jiliang Tang 
(2019). Model-based reinforcement learning for whole-
chain recommendations. arXiv preprint 
arXiv:1902.03987. 
Xinshi Chen, Shuang Li, Hui Li, Shaohua Jiang, Yuan Qi, 
and Le Song (2019). Generative adversarial user model 
for reinforcement learning based recommendation 
system. In International Conference on Machine 
Learning, pages 1052{1061. 
Xiting Wang, Yiru Chen, Jie Yang, Le Wu, Zhengtao Wu, 
Xing Xie (2018). A reinforcement learning framework 
for explainable recommendation. Conference on Data 
Mining, pages 587-596. IEEE. 
Yongfeng Zhang, Xu Chen (2018). Explainable 
recommendation: A survey and new perspectives. 
arXiv:1804.11192. 
YuWang (2020). A hybrid recommendation for music based 
on reinforcement learning. In Pacific-Asia Conference on 
Knowledge Discovery and Data Mining, pages 91-103. 
Springer, 
Zachary C Lipton (2018). The mythos of model 
interpretability. Queue, 16(3):31-57 
Zhengyao Jiang, Dixing Xu, Jinjun Liang (2017). A deep 
reinforcement learning framework for the nancial 
portfolio management problem. arXiv. 
Zimdars, D. M. Chickering, C. Meek (2001). Using temporal 
data for making recommendations. In 17th Conference in 
Uncertainty in Articial Intelligence. 
Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc 
Lanctot, Nando Freitas (2016). Dueling network 
architectures for deep reinforcement learning. In 
International conference on machine learning.