【博士Vlog】2024最新模型Mamba详解,Transformer已死,你想知道的都在这里了!

แชร์
ฝัง
  • เผยแพร่เมื่อ 29 ธ.ค. 2024

ความคิดเห็น • 78

  • @paidaxing754
    @paidaxing754 8 หลายเดือนก่อน +4

    讲得非常不错,受用了

  • @wangxiao_ahu
    @wangxiao_ahu 8 หลายเดือนก่อน +2

    感谢分享!能否把 SSM 更加详细的角度进行解析?
    1. SSM 原始模型;
    2. SSM 模型的离散化;
    3. SSM scan 机制;
    4. Mamba 与 GPU 硬件加速的关系;
    5. Mamba 的核心优势与特色;
    6. Mamba 的各种应用。
    🤣

    • @phdvlog2024
      @phdvlog2024  8 หลายเดือนก่อน

      太麻烦了,等这个模型彻底火了吧。那时候也有好用的程序可以直接部署了。

    • @wangxiao_ahu
      @wangxiao_ahu 8 หลายเดือนก่อน

      @@phdvlog2024 我们进行了一些 vision mamba 模型的测试,但是部分任务上有提升,大部分任务都比不过 ViT。显存使用上也不见明显的降低,这就很奇怪。🤣 博主有进行一些实验验证么?

    • @phdvlog2024
      @phdvlog2024  8 หลายเดือนก่อน +1

      vision mamba可能需要特别的调参,因为mamba这个模型里面的abcd都可以调整,那么用原始模型可能拼不过老的

  • @kyrieirving5928
    @kyrieirving5928 8 หลายเดือนก่อน +1

    讲得很棒哈哈哈

  • @leeyanbin2896
    @leeyanbin2896 7 หลายเดือนก่อน +2

    讲的真好

  • @刘环菁
    @刘环菁 9 หลายเดือนก่อน +3

    博主讲的好棒,可谓通俗易懂,从b'站追过来的。可以求问一下博主的ppt是如何高效收集相关的架构图的?感觉非常直观!!简洁明了,通俗易懂🤩

    • @phdvlog2024
      @phdvlog2024  9 หลายเดือนก่อน +2

      论文里的原图,你可以看这个文章引用的和引用这个文章的其他论文,然后就能收集全了

    • @phdvlog2024
      @phdvlog2024  9 หลายเดือนก่อน

      多评论哈,我也好知道我讲的如何

    • @hasesukkt
      @hasesukkt 7 หลายเดือนก่อน

      @@phdvlog2024 学习了!

  • @user-js4hs1mo9q
    @user-js4hs1mo9q 14 วันที่ผ่านมา

    博主,您好!想知道研一如何发一篇二区论文,本人的研究方向是AI气象,做的短临降水预报,可以给点建议吗😄

    • @张三-d1e
      @张三-d1e 7 วันที่ผ่านมา

      时序任务么

    • @user-js4hs1mo9q
      @user-js4hs1mo9q 6 วันที่ผ่านมา

      @@张三-d1e 是的

    • @phdvlog2024
      @phdvlog2024  5 วันที่ผ่านมา

      我不是搞这个的😂

    • @张三-d1e
      @张三-d1e 5 วันที่ผ่านมา

      @@user-js4hs1mo9q 二区的话,mamba可以试试,魔改一下

  • @QuinnZack
    @QuinnZack 8 หลายเดือนก่อน +1

    博主讲的对于应用的人来说很不错了,请问下ppt方便分享的吗?

    • @phdvlog2024
      @phdvlog2024  8 หลายเดือนก่อน +1

      比较麻烦,ppt里面可能有点我的个人信息,里面所有的东西都是文章截图,所以也没啥需要分享的

  • @QuinnZack
    @QuinnZack 8 หลายเดือนก่อน +1

    博主讲的对于应用的人来说很不错,想问下博主的ppt方便share吗? 想在基础上细化一下算法

    • @phdvlog2024
      @phdvlog2024  8 หลายเดือนก่อน

      你去看一下他们的代码呗

  • @utei9502
    @utei9502 5 หลายเดือนก่อน

    謝謝博主講解,尤其是關於GPU 各級memory的利用對training and inference speed的影響還是比較有趣的。不過解説中很多專業術語用得都不對,講解也流於表面,甚至似是而非。建議博主系統學習機器學習的基礎知識,以提高視頻的專業性。

    • @phdvlog2024
      @phdvlog2024  5 หลายเดือนก่อน +1

      有些不对因为发论文的时候都是ChatGPT直接打磨 根本不需要对😂

    • @phdvlog2024
      @phdvlog2024  5 หลายเดือนก่อน +1

      而且我写英文是对的 中文我对不上

  • @williamzhou4353
    @williamzhou4353 4 หลายเดือนก่อน

    请问可以分享一下PPT吗!谢谢!

    • @phdvlog2024
      @phdvlog2024  4 หลายเดือนก่อน

      你截图吧,ppt做的也没有多好😂

    • @williamzhou4353
      @williamzhou4353 3 หลายเดือนก่อน

      @@phdvlog2024 好好好 哈哈哈 辛苦你啦

  • @JacobLiu-q7v
    @JacobLiu-q7v 4 หลายเดือนก่อน +2

    讲的很好,比b站付费课程好了不少。

  • @jaylenzhang4198
    @jaylenzhang4198 7 หลายเดือนก่อน

    它这个prefix sum更像是个segment tree数据结构

    • @phdvlog2024
      @phdvlog2024  7 หลายเดือนก่อน

      是有点像

  • @AGI.Trainer
    @AGI.Trainer 8 หลายเดือนก่อน +2

    RNN不是像拉锁,而是像拉锁头吧。

    • @phdvlog2024
      @phdvlog2024  8 หลายเดือนก่อน +2

      对,但是RNN后面会生成拉链

  • @sunnysky1193
    @sunnysky1193 6 หลายเดือนก่อน

    适合初步了解一下,可惜关键地方都一笔带过,有点避重就轻…

    • @phdvlog2024
      @phdvlog2024  6 หลายเดือนก่อน

      因为关键的地方全是自动控制原理 不是一个视频能讲明白的

  • @PeijiYang-t6f
    @PeijiYang-t6f 5 หลายเดือนก่อน

    你好,大佬。如果是类似状态机的方法,是如何解决lstm和传统rnn的遗忘问题的呢?

    • @phdvlog2024
      @phdvlog2024  5 หลายเดือนก่อน

      没有办法 lstm已经是利用动量更新来解决遗忘了 直接上transformer用空间换吧

    • @PeijiYang-t6f
      @PeijiYang-t6f 5 หลายเดือนก่อน

      @@phdvlog2024 所以看上去,mamba只是解决了lstm训练过程慢的问题,在长期记忆上相比transformer还是弱很多。即便因为优化了内存的原因可以容纳更长的上下文,实际上效果未必会更好。

  • @fdsmolasfae
    @fdsmolasfae 7 หลายเดือนก่อน

    大佬可否讲讲RAG和long-context两条技术路线的对比

    • @fdsmolasfae
      @fdsmolasfae 7 หลายเดือนก่อน

      Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention 我看了这个paper前景不错

    • @phdvlog2024
      @phdvlog2024  7 หลายเดือนก่อน +1

      试试

  • @devinzhou4913
    @devinzhou4913 6 หลายเดือนก่อน

    mamba移掉了positional encoding吗?

    • @phdvlog2024
      @phdvlog2024  6 หลายเดือนก่อน

      没有吧

  • @yunbow5630
    @yunbow5630 6 หลายเดือนก่อน

    12:35 这个图是哪里来的 请问老板

    • @phdvlog2024
      @phdvlog2024  6 หลายเดือนก่อน

      原论文

    • @yunbow5630
      @yunbow5630 6 หลายเดือนก่อน

      @@phdvlog2024 没有阿老板 我看这个图像colab的。

  • @foreverwhisper
    @foreverwhisper 8 หลายเดือนก่อน

    普通码农在今年要如何开始学习AI呢,有需要先补一下的数学概念吗

    • @phdvlog2024
      @phdvlog2024  8 หลายเดือนก่อน +1

      没必要,问chatgpt就行了,先找个系统的视频学学

    • @foreverwhisper
      @foreverwhisper 8 หลายเดือนก่อน

      @@phdvlog2024 我翻了翻你的视频,决定先订阅了再说😊

    • @phdvlog2024
      @phdvlog2024  8 หลายเดือนก่อน

      我之后会出个合集,这样你就能比较好的了解了,图像分类问题的合集

  • @weiseven717
    @weiseven717 9 หลายเดือนก่อน +2

    其实我比较好奇博主是怎么快速学习这么多知识的 毕竟AI领域的内容这么庞杂 是时时刻刻关注最新进展然后去阅读 还是系统性的花了一段时间去学习 有机会可以讲讲吗

    • @phdvlog2024
      @phdvlog2024  9 หลายเดือนก่อน +2

      多读论文,然后看不懂的要么问chatgpt,要么去看他引用了谁的论文(前人工作)这样捋下来就通顺了

    • @phdvlog2024
      @phdvlog2024  9 หลายเดือนก่อน +1

      多评论哈,我也好知道我讲的如何

    • @MDNQ-ud1ty
      @MDNQ-ud1ty 7 หลายเดือนก่อน +2

      Many do not actually understand as much as they pretend/present.as they are just regurgitating what they read or saw and it is easy to talk.
      But the most important thing is immersion. The more one spends in something the more one understands. The most work is at the front.
      AI, in fact, is not that complex. It is basic linear algebra and it's fundamentals are very simple(that of curve fitting). Much of AI is really the perfection of something very simple and the use of very large compute machines that are now available.
      1. Make sure you understand math.You must understand linear algebra. It is not difficult but may seem so. Basically it is the idea of vectors/lists/tuples and their transformations(matrices). without understanding the language of linear algebra and the core concepts you will constantly struggle. Most AI will use some concepts outside of LA in specific ways and they may need to be learned as one see's them. Calculus is also a must. At least understand differentiation and integration along with being comfortable with partial differentiation, chain rules, etc. The better you understand these(which comes with and work) the easier things get.
      2. Make sure to actually do things. The best way to learn is through experience. If you just read things it may make sense but you actually don't understand it well or you will forget. If you do actual work(e.g., design your own NN's from scratch or implement algorithms and such) the more it will make sense and feel real. It may take time and you may struggle a lot but the struggle = learning = understanding. It is not magic and it is not "quick". No one is born with such knowledge. Most spend a huge amount of time in it. Familiarity makes it clear.
      3. If this is something you want to do then start doing it as much as possible. If you want to be a "pro" then you have to act like a pro: Do the things they would do such as spend 8 hours a day working on such things(or as much as possible as you have but at least an hour). E.g., You should be reading all the main papers and reading them several times if you need to. If you do not understand something you have to learn to understand what you do not understand and seek out learning that. Over time, maybe a year or 5 you will learn so much more.
      Knowledge accumulates slowly at first. It is like building a pyramid or house. At first you have to do a lot of work and it seems slow but after some time you have the foundation... and then the walls and then the roof and then it looks like a house... and then it is adding smaller things like windows and electrical. Then lighting and furniture.
      Learning is built the same way. Ultimately you have to find your own way(you are different than everyone else).
      But the simple thing is that you have to put in the time.
      1. You have to know how to program. You can sorta learn both on some level. Follow tutorials. If you have to just copy in the code you see even if you don't understand. The very act of copying/imitating will teach you because you will remember things and it will accumulate. But you should be able to program. Python is the language that most people use now since it is very good for learning and doing things precisely because so many people use it. It is not a great language but it is worth learning(there are better languages but no one uses them because no one uses them). Learning something like pytorch or tensorflow in python. This means being able to build basic NN's and having an idea of how to put things together. This requires learning the API and stuff. Again, if you have no clue just find some videos online and start typing in the stuff that they do. After you watch a few you will have some idea and you then build on it.
      2. Don't expect things overnight. It does take time. If you are serious then the time does not matter. 5 years, 20 years, whatever. It is a life long profess and the field, as all do, will evolve and grow and you will always be behind(because so many humans are contributing there is always new stuff and you can only do so much). So ultimately you have to do it because you want to do it for yourself. Else you will give up because it is too much worse and not fun. The way to keep it fun is to want to use it for things you want. E.g., you have ideas you want to use it for and then work towards those goals. This way when you get up in the morning you are thinking about how to use it for the thing you want.
      3. You can do it if you want. These things are not complicated but they do require learning and learning takes time. Anyone has the ability to learn something but most people do not have the desire or time(due to capitalism/life). It will change your life. You can choose to learn other things(piano/music, martial arts, sociology, history, etc). Each one will change your life in a different way(after 50 years).
      4. Try to learn more than one thing. If you just focus on one thing you will know one thing. Try to not get myopic. AI isn't just about coding. It's also about life... so knowing other things about life is relevant. E.g., learning music can also help you learn about AI. Learning about biology can help you learn AI... and vice versa. What makes most "intelligent people" different than the "average person" is that intelligent people want to learn and so are always learning new things rather than doing "fun things"(e.g., fvcking, playing video games, watching movies, drinking, etc). Learning is also addictive. Because you start to see how the universe work and want to see more. Of course balance is important. Too much learning can be problematic.
      Ultimately though you have to figure it out. Only you know you. It might take some time for you to figure out exactly how to build your life but as long as you are moving in the direction you want then you will get somewhere. Likely not where you initially wanted but you will, as long as you are moving forward, look back in 50 years and be amazed at how far you went. [Note I'm just describing the gradient descent algorithm... it's all the same stuff. The algorithm literally was derived from our experiences and ideas about life as humans and combined with other things(such as math) to accomplish new things(such as AI)]
      Also, when you are learning something and feel lost, that is ok... that tells you something. You should always feel lost a little. You have to learn to "ride the wave" of always feeling a little lost but not too lost. That means you are doing it right. Just slightly uncomfortable. If you feel totally lost you won't understand anything and are wasting a lot of time. It means you should go back and learn simpler things that you do not know(the lost feeling is because you do not know things).
      Everyone that is very good started out exactly the same as everyone else. When I first started programming, or actually doing anything, I had no clue and could not envision what the long term would entail. I just did it. I moved forward not knowing the destination. But after 30 years of programming in 30+ languages over a wide variety of systems and architectures you see the world much simpler and through the programming lens. So many things look different but, in fact are the same. It's like cats, Cats come in many sizes, colors, personalities, etc. But they are all cats. The more you interact with different cats the more you understand what general cat is[this is just sampling/data and how AI works too]. When I started learning math it was just because I didn't understand it and saw it as mysterious and started trying to figure out what the heck they were talking about. I sucked at math and wasn't interested in it at first until I was. Time*Desire*Effort*Memory*Organization = Knowledge. Even though we are all different we each also have different factors. Some can put in more time but have worse memories. Some have more desire but lower effort. The results are what they are. But it's all the same in the end as far as just learning. Most kids are not taught correctly or how to learn or the consequences of it(I was a kid that was taught very poorly and almost everything I learned was due to just me wanting to learn it and struggling very hard to learn it. I had a lot time, desire, and effort but a very bad memory and worse organization. But this has let me achieve quite a bit because I made up for my weaknesses using the other factors. I should have worked on my organizational skills and memory but I didn't know how early on and didn't understand the implications).
      Life, in some sense, is only complicated because we start out with basically zero knowledge and have to build up. But we generally are given enough time to amass quite a bit. What makes everyone different is some learn at a slower rate than others and so there is a spread/distribution and the people at the middle are amazed at those at the top. But, in fact, it's just that some focused more on singular things(such as MJ only learning BB but being dumb in almost everything else yet people will treat him as a god. It's no god but someone that only did BB. Anyone else that did just as much BB as him with the same luck and such will be approximately just as good). You are what you eat... and you are what you "eat"(do).
      Good luck with it. Best thing to do is to jump right in. Even if you are totally lost you will still learn something and eventually "learn to swim".

    • @jasperyoon4301
      @jasperyoon4301 5 หลายเดือนก่อน

      @@MDNQ-ud1ty Wonderful! Learned a lot from you. Thanks very much.

    • @michaelzap8528
      @michaelzap8528 5 หลายเดือนก่อน

      @@MDNQ-ud1ty说的太好了。 看得我眼泪汪汪。如果10年前我能看到你这个,那该多好啊。

  • @ErenNew787
    @ErenNew787 7 หลายเดือนก่อน

    想问下大佬的看法,这个模型会不会成为今年的顶刊顶会风向呢

    • @phdvlog2024
      @phdvlog2024  7 หลายเดือนก่อน

      已经成为了 各种中小会议已经刷榜了 我是审稿看到的 很多 有三分之一吧

    • @phdvlog2024
      @phdvlog2024  7 หลายเดือนก่อน

      顶会不知道

    • @ErenNew787
      @ErenNew787 7 หลายเดือนก่อน

      @@phdvlog2024好的,谢谢大佬回复

    • @唐鹏-t3n
      @唐鹏-t3n 5 หลายเดือนก่อน

      最新的不是TTT吗?test-time training

    • @ErenNew787
      @ErenNew787 5 หลายเดือนก่อน

      @@唐鹏-t3n 在这个视频的时候还是mamba,而且现在transformer挑战者太多了,效果还都不太行

  • @yangyang1412
    @yangyang1412 8 หลายเดือนก่อน +1

    上個說幹掉transformer的草已經長這麼高了

    • @phdvlog2024
      @phdvlog2024  8 หลายเดือนก่อน

      一直在发展,现在已经到diffusion中了

  • @FangXiaoyu-fi9kw
    @FangXiaoyu-fi9kw 7 หลายเดือนก่อน

    可以分享一下ppt嘛球球了

    • @phdvlog2024
      @phdvlog2024  7 หลายเดือนก่อน

      这些图都是原文➕网上找的

  • @FangXiaoyu-fi9kw
    @FangXiaoyu-fi9kw 7 หลายเดือนก่อน

    可以分享一下ppt嘛

    • @phdvlog2024
      @phdvlog2024  7 หลายเดือนก่อน

      这些图都是网上的加论文pdf里面的 直接截取就好了

  • @pakersmuch3705
    @pakersmuch3705 6 หลายเดือนก่อน +1

    腻害!

  • @Chuhao-t1s
    @Chuhao-t1s 6 หลายเดือนก่อน

    谁来给博主建立损失函数?..我!讲得好!言简意赅!MSE->0

  • @部落课程
    @部落课程 6 หลายเดือนก่อน

    那个是华中科技大学吗。。。

  • @yunbow5630
    @yunbow5630 9 หลายเดือนก่อน

    不会的 这几年还得是attention

    • @phdvlog2024
      @phdvlog2024  9 หลายเดือนก่อน +5

      今年已经新模型涌现了,估计今年cvpr nips就会被屠榜

    • @yunbow5630
      @yunbow5630 6 หลายเดือนก่อน

      @@phdvlog2024 我看mamba2能行ahhh

  • @frogasian8888
    @frogasian8888 8 หลายเดือนก่อน +1

    看起來是一個寶藏頻道但沒有名字只有"博士"的話有點不好推廣

    • @phdvlog2024
      @phdvlog2024  8 หลายเดือนก่อน +1

      也许以后改改吧

  • @LouisCubingChannel
    @LouisCubingChannel 8 หลายเดือนก่อน +1

    我天你声色好像方脸。。@多伦多