Causal foundations for safe AGI - Tom Everitt (DeepMind)

แชร์
ฝัง
  • เผยแพร่เมื่อ 27 ม.ค. 2025
  • Slides: www.dropbox.co...
    Abstract:
    With great power comes great responsibility. Human-level+ artificial general intelligence (AGI) may become humanity’s best friend or worst enemy, depending on whether we manage to align its behavior with human interests or not. To overcome this challenge, we must identify the potential pitfalls and develop effective mitigation strategies. In this talk, I’ll argue that (Pearlian) causality offers a useful formal framework for reasoning about AI risk, and describe some of our recent work on this topic. In particular, I’ll cover causal definitions of incentives, agents, side effects, generalization, and preference manipulation, and discuss how techniques like recursion, interpretability, impact measures, incentive design, and path-specific effects can combine to address AGI risks.
    Bio:
    Tom Everitt is a senior researcher at DeepMind, leading a small team on causal approaches to AGI safety. He holds a PhD from Australian National University, where he wrote the first PhD thesis fully focused on AGI safety under the supervision of Prof. Marcus Hutter.

ความคิดเห็น •