Exploration Incentives in Model-Based Reinforcement Learning: Alec Koppel

แชร์
ฝัง
  • เผยแพร่เมื่อ 15 พ.ค. 2024
  • Speaker: Alec Koppel, AI Research Lead/VP in the Multiagent Learning and Simulation Group within Artificial Intelligence Research, JP Morgan Chase & Co.
    Talk Title: Exploration Incentives in Model-Based Reinforcement Learning
    Abstract: Reinforcement Learning (RL) is a form of stochastic adaptive control in which one seeks to estimate parameters of a controller only from data, and has gained popularity in recent years. However, technological applications of RL are often hindered astronomical sample complexity demanded by their training. Model-based reinforcement learning is known to provide a practically sample efficient approach; however, its performance certificates in terms of Bayesian regret often require restrictive Gaussian assumptions, and may fail to distinguish between vastly different performance in sparse or dense reward settings. Motivated by these gaps, we propose a way to make MBRL, namely, Posterior Sampling combined with Model-Predictive Control (MPC), computationally efficient for mixture distributions based a novel application of integral probability metrics and kernelized Stein discrepancy. Then, we build upon this insight to pose a new exploration incentive called Stein Information Gain, which permits us to come up with a variant of information-directed sampling (IDS) whose exploration incentive is evaluable in closed-form. Bayesian and information-theoretic regret bounds of the proposed algorithms are presented. Finally, experimental validation on some environments from OpenAI Gym and Deepmind Control Suite illuminates the merits of the proposed methodologies in the sparse-reward setting.
    Bio: Alec Koppel is an AI Research Lead (Senior Scientist) at JP Morgan AI Research in the Multi-agent Learning and Simulation Group. From 2021-2022, he was Research Scientist at Amazon within Supply Chain Optimization Technologies (SCOT). From 2017-2021, he was a Research Scientist with the U.S. Army Research Laboratory in the Computational and Information Sciences Directorate (CISD) from 2017-2021. He completed his Master’s degree in Statistics and Doctorate in Electrical and Systems Engineering, both at the University of Pennsylvania (Penn) in August of 2017. Before coming to Penn, he completed his Master’s degree in Systems Science and Mathematics and Bachelor’s Degree in Mathematics, both at Washington University in St. Louis (WashU), Missouri. He is a recipient of the 2016 UPenn ESE Dept. Award for Exceptional Service, an awardee of the Science, Mathematics, and Research for Transformation (SMART) Scholarship, a co-author of Best Paper Finalist at the 2017 IEEE Asilomar Conference on Signals, Systems, and Computers, a finalist for the ARL Honorable Scientist Award 2019, an awardee of the 2020 ARL Director’s Research Award Translational Research Challenge (DIRA-TRC), a 2020 Honorable Mention from the IEEE Robotics and Automation Letters, and mentor to the 2021 ARL Summer Symposium Best Project Awardee. His academic work focuses on approximate Bayesian inference, reinforcement learning, and decentralized optimization. He has worked on applications spanning robotics and autonomy; vendor selection and sourcing; and financial markets of various types.
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น •