Independent Learning in Stochastic Games: Where Strategic Decision-Making Meets RL: Kaiqing Zhang

I can't STOP reading these Machine Learning Books!

Ansys AI: Transforming Simulation at the Speed of AI

Tom and jerry #vivekveena #youtubeshorts

เมาส์แห่ง... คำสาป 0/17 #เกม #valorant #shorts

แกล้งคนให้ใช้ชีวิตแบบสุ่ม ถูก vs แพง!! เที่ยวกรุงเทพ 10บาท หรือ 30,000บาทต่อวัน จะซวยหรือดวงดี

Exploration Incentives in Model-Based Reinforcement Learning: Alec Koppel

Hariri Institute for Computing, Boston University

มุมมอง 47

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 15 พ.ค. 2024
Speaker: Alec Koppel, AI Research Lead/VP in the Multiagent Learning and Simulation Group within Artificial Intelligence Research, JP Morgan Chase & Co.
Talk Title: Exploration Incentives in Model-Based Reinforcement Learning
Abstract: Reinforcement Learning (RL) is a form of stochastic adaptive control in which one seeks to estimate parameters of a controller only from data, and has gained popularity in recent years. However, technological applications of RL are often hindered astronomical sample complexity demanded by their training. Model-based reinforcement learning is known to provide a practically sample efficient approach; however, its performance certificates in terms of Bayesian regret often require restrictive Gaussian assumptions, and may fail to distinguish between vastly different performance in sparse or dense reward settings. Motivated by these gaps, we propose a way to make MBRL, namely, Posterior Sampling combined with Model-Predictive Control (MPC), computationally efficient for mixture distributions based a novel application of integral probability metrics and kernelized Stein discrepancy. Then, we build upon this insight to pose a new exploration incentive called Stein Information Gain, which permits us to come up with a variant of information-directed sampling (IDS) whose exploration incentive is evaluable in closed-form. Bayesian and information-theoretic regret bounds of the proposed algorithms are presented. Finally, experimental validation on some environments from OpenAI Gym and Deepmind Control Suite illuminates the merits of the proposed methodologies in the sparse-reward setting.
Bio: Alec Koppel is an AI Research Lead (Senior Scientist) at JP Morgan AI Research in the Multi-agent Learning and Simulation Group. From 2021-2022, he was Research Scientist at Amazon within Supply Chain Optimization Technologies (SCOT). From 2017-2021, he was a Research Scientist with the U.S. Army Research Laboratory in the Computational and Information Sciences Directorate (CISD) from 2017-2021. He completed his Master’s degree in Statistics and Doctorate in Electrical and Systems Engineering, both at the University of Pennsylvania (Penn) in August of 2017. Before coming to Penn, he completed his Master’s degree in Systems Science and Mathematics and Bachelor’s Degree in Mathematics, both at Washington University in St. Louis (WashU), Missouri. He is a recipient of the 2016 UPenn ESE Dept. Award for Exceptional Service, an awardee of the Science, Mathematics, and Research for Transformation (SMART) Scholarship, a co-author of Best Paper Finalist at the 2017 IEEE Asilomar Conference on Signals, Systems, and Computers, a finalist for the ARL Honorable Scientist Award 2019, an awardee of the 2020 ARL Director’s Research Award Translational Research Challenge (DIRA-TRC), a 2020 Honorable Mention from the IEEE Robotics and Automation Letters, and mentor to the 2021 ARL Summer Symposium Best Project Awardee. His academic work focuses on approximate Bayesian inference, reinforcement learning, and decentralized optimization. He has worked on applications spanning robotics and autonomy; vendor selection and sourcing; and financial markets of various types.
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น •

ต่อไป

เล่นอัตโนมัติ

Independent Learning in Stochastic Games: Where Strategic Decision-Making Meets RL: Kaiqing Zhang

Independent Learning in Stochastic Games: Where Strategic Decision-Making Meets RL: Kaiqing Zhang

I can't STOP reading these Machine Learning Books!

I can't STOP reading these Machine Learning Books!

Ansys AI: Transforming Simulation at the Speed of AI

Ansys AI: Transforming Simulation at the Speed of AI

Tom and jerry #vivekveena #youtubeshorts

Tom and jerry #vivekveena #youtubeshorts

เมาส์แห่ง... คำสาป 0/17 #เกม #valorant #shorts

เมาส์แห่ง... คำสาป 0/17 #เกม #valorant #shorts

แกล้งคนให้ใช้ชีวิตแบบสุ่ม ถูก vs แพง!! เที่ยวกรุงเทพ 10บาท หรือ 30,000บาทต่อวัน จะซวยหรือดวงดี

แกล้งคนให้ใช้ชีวิตแบบสุ่ม ถูก vs แพง!! เที่ยวกรุงเทพ 10บาท หรือ 30,000บาทต่อวัน จะซวยหรือดวงดี

🔴LIVE : ถ่ายทอดสด การออกรางวัลสลากกินแบ่งรัฐบาล งวดวันที่ 1 มิ.ย. 2567

🔴LIVE : ถ่ายทอดสด การออกรางวัลสลากกินแบ่งรัฐบาล งวดวันที่ 1 มิ.ย. 2567

Representation-Based Learning and Control for Dynamical Systems

Representation-Based Learning and Control for Dynamical Systems

Top 10 Great Female Scientists In INDIA 🇮🇳

Top 10 Great Female Scientists In INDIA 🇮🇳

simple science experiment #shorts

simple science experiment #shorts

Variational Autoencoders

Variational Autoencoders

Necessity of complex numbers

Necessity of complex numbers

AI Learns To Play Golf

AI Learns To Play Golf

PaliGemma by Google: Train Model on Custom Detection Dataset

PaliGemma by Google: Train Model on Custom Detection Dataset

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

Singular Value Decomposition (the SVD)

Singular Value Decomposition (the SVD)

Impossible sigma 🤣- para SAMSUNG A3,A5,A6,A7,J2,J5,J7,S5,S6,S7,S9,A10,A20,A30,A50,A70 /// FREEFIR

Impossible sigma 🤣- para SAMSUNG A3,A5,A6,A7,J2,J5,J7,S5,S6,S7,S9,A10,A20,A30,A50,A70 /// FREEFIR

รอเปลี่ยนมา 3 ปีกับ iPad Pro M4 เครื่องบางแบบนี้มีอะไรน่าใช้บ้าง, Pencil Pro คือเลิศ! Peanut Butter

รอเปลี่ยนมา 3 ปีกับ iPad Pro M4 เครื่องบางแบบนี้มีอะไรน่าใช้บ้าง, Pencil Pro คือเลิศ! Peanut Butter

ARM-процессоры Apple M - всё. Как кризис настиг Apple…

ARM-процессоры Apple M — всё. Как кризис настиг Apple…

Best Slim Smart Keypad Mobile 2024

Best Slim Smart Keypad Mobile 2024

1 สัปดาห์หลังใช้ iPad Air 6 - เทียบ Air5 และ Pro M2

1 สัปดาห์หลังใช้ iPad Air 6 - เทียบ Air5 และ Pro M2

iPhone 15 Pro vs Samsung s24🤣 #shorts

iPhone 15 Pro vs Samsung s24🤣 #shorts

How charged your battery?

How charged your battery?

คอม AI เครื่องละล้านกว่า แรม 1TB #computex2024 #extremeit #aorus

คอม AI เครื่องละล้านกว่า แรม 1TB #computex2024 #extremeit #aorus