Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - 693

แชร์
ฝัง
  • เผยแพร่เมื่อ 17 พ.ย. 2024

ความคิดเห็น • 15

  • @first-thoughtgiver-of-will2456
    @first-thoughtgiver-of-will2456 2 หลายเดือนก่อน +2

    Thanks Albert and Sam! Surprisingly insightful for someone researching the Mamba architecture right now!

  • @l.halawani
    @l.halawani หลายเดือนก่อน

    Super happy to see you on YT! Been missing you since Alphabet scraped Google Podcasts! Awesome content.

  • @JonathanWong-u8g
    @JonathanWong-u8g หลายเดือนก่อน

    As of 2024, the latent diffusion paradigm has been very successful in these 'natural' modality tasks (sound, images, video) and the paradigm is now being applied to 3D spatial awareness. We've actually been in the post-transformer era for a while (1-2 years)! I am wondering where Gu's work fits in here-- perhaps these Mamba models will produce better latents for extremely long-context video and spatial point cloud data? Will stay tuned. Thanks for the talk!

    • @mephilees7866
      @mephilees7866 หลายเดือนก่อน

      the problem with latent diffusion (something like DiT) is that it's too slow. especially with high bandwidth data like images. Mamba will help in the encoder part. but i don't see how to benefit from it in the decoder part. i would suggest you check VAR(Visual Autoregression). it works by regressing the next resolution instead of out of noise. around 20x faster with better performance.

    • @JonathanWong-u8g
      @JonathanWong-u8g หลายเดือนก่อน

      @@mephilees7866 Excellent, thank you!

  • @lobovutare
    @lobovutare 3 หลายเดือนก่อน +2

    Interesting to hear that the author of Mamba feels that attention is indispensable. My initial thought was that Mamba is a full replacement for Transformers, but it seems that Gu believes attention layers are still necessary for the model to be able to reason at the level of tokens. Perhaps hybrid models like Jamba are the way to go.

    • @Noah-jz3gt
      @Noah-jz3gt 2 หลายเดือนก่อน +3

      Well seems like Gu tries to find theoretical relations between attention and SSM in Mamba-2. For me, Mamba even doesn't look like SSM anymore to be honest.

  • @wwkk4964
    @wwkk4964 4 หลายเดือนก่อน +1

    Brilliant, the tokenizer ought to be a learned parameter that coevolves in response to task.

  • @chickenp7038
    @chickenp7038 3 หลายเดือนก่อน

    great interview

  • @minshenlin127
    @minshenlin127 13 วันที่ผ่านมา

    Hi, may I know how to add your channel to Apple Podcast?

    • @twimlai
      @twimlai  13 วันที่ผ่านมา

      Hi. You can follow our channel here: podcasts.apple.com/us/podcast/the-twiml-ai-podcast-formerly-this-week-in-machine/id1116303051

    • @minshenlin127
      @minshenlin127 11 วันที่ผ่านมา

      @@twimlai Thank you for your reply. But I cannot visit the site; the URL seems invalid

    • @twimlai
      @twimlai  11 วันที่ผ่านมา

      Strange. Works on my end. Try twimlai.com/podcast and look for the button on that page.

    • @minshenlin127
      @minshenlin127 วันที่ผ่านมา

      ​@@twimlai Thank you very much. But it's still not working. So I use Spotify now😃

  • @ps3301
    @ps3301 3 หลายเดือนก่อน

    How about vision?