From large pre-trained language models discovering linguistic structure towards foundation models

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 ต.ค. 2021
  • October 2021, Chris Manning, the inaugural Thomas M. Siebel Professor in Machine Learning in the Departments of Linguistics and Computer Science at Stanford University, and director of Stanford’s Artificial Intelligence Laboratory (SAIL), gave a keynote presentation at Amazon's annual, internal machine learning conference.
    His talk considers the following setup: a ML system can interact with an expensive oracle (the “real world”) by iteratively proposing batches of candidate experiments and then obtaining a score for each experiment (“how well did it work?”). The data from all the rounds of queries and results can be used to train a proxy for the oracle, a form of world model. The world model can then be queried (much more cheaply than the world model) in order to train (in-silico) a generative model which proposes experiments, to form the next round of queries.
    Systems which can do that well can be applied in interactive recommendations, to discover new drugs, new materials, control plants or learn how to reason and build a causal model. They involve many interesting ML research threads, including active learning, reinforcement learning, representation learning, exploration, meta-learning, Bayesian optimization, black-box optimization.
    What should be the training criterion for this generative model? Why not simply use Monte-Carlo Markov chain (MCMC) methods to generate these samples? Is it possible to bypass the mode-mixing limitation of MCMCs? How can the generative model guess where good experiments might be before having tried them? How should the world model construct a representation of its epistemic uncertainty, i.e., where it expects to predict well or not?
    On the path to answering these questions, Chris introduces a new and exciting deep learning framework called GFlowNets which can amortize the very expensive work normally done by MCMC to convert an energy function into samples and opens the door to fascinating possibilities for probabilistic modeling, including the ability to quickly estimate marginalized probabilities and efficiently represent distributions over sets and graphs.
    Follow us:
    Website: www.amazon.science
    Twitter: / amazonscience
    Facebook: / amazonscience
    Instagram: / amazonscience
    LinkedIn: / amazonscience
    Newsletter: www.amazon.science/newsletter
    #AmazonScience #MachineLearning
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 3

  • @revolutionarydefeatism
    @revolutionarydefeatism 2 ปีที่แล้ว

    Thanks, Chris! I really appreciate the efforts to keep advances in NLP out of Big businesses!

  • @ChocolateMilkCultLeader
    @ChocolateMilkCultLeader 2 ปีที่แล้ว

    Would love more people talking about AI robustness. Things such as the One Pixel Attack showed that having these huge models can be very fragile to adversarial algorithms

  • @cartpepito8581
    @cartpepito8581 2 ปีที่แล้ว

    rj3gje
    #von.ong