Mistral AI goes MAMBA!!!

แชร์
ฝัง
  • เผยแพร่เมื่อ 5 ก.ย. 2024
  • Codestral Mamba (from Mistral AI), a Mamba2 language model specialised in code generation, available under an Apache 2.0 license.
    🔗 Links 🔗
    Codestral Mamba from Mistral announcement - mistral.ai/new...
    Mamba Codestral on Hugging Face - huggingface.co...
    ❤️ If you want to support the channel ❤️
    Support here:
    Patreon - / 1littlecoder
    Ko-Fi - ko-fi.com/1lit...
    🧭 Follow me on 🧭
    Twitter - / 1littlecoder
    Linkedin - / amrrs

ความคิดเห็น • 36

  • @tannenbaumxy
    @tannenbaumxy หลายเดือนก่อน +11

    Wow, nice to finally see Mamba being implemented in state of the art LLMs. Jamba and Zamba showed the potential of this architecture. Let's see what the future brings for Mamba in open weights LLMs

    • @1littlecoder
      @1littlecoder  หลายเดือนก่อน +1

      Absolutely 💯

  • @michaelmccoubrey4211
    @michaelmccoubrey4211 หลายเดือนก่อน +8

    Seen as the mamba architechture has linear memory complexity, you might be able to provide the model with an entire code repository as context (on consumer hardware). Looking forward to refactoring my code base with a single prompt!

  • @henkhbit5748
    @henkhbit5748 หลายเดือนก่อน +1

    Always good that new architectures are coming that compete with the transformers. Especially that long context does not have quadratic impact on memory. Thanks for the update👍

  • @samyogdhital
    @samyogdhital หลายเดือนก่อน

    loved it! You are actually giving information as fast as possible

  • @narasimhasaladi7
    @narasimhasaladi7 หลายเดือนก่อน +5

    Try to zoom in the text when ur Explanaing some particular info ,in editing.
    That's feels good I think 🙌

    • @1littlecoder
      @1littlecoder  หลายเดือนก่อน +1

      Thank you, do you mean what I did was nice or I should try to zoom in more?

    • @narasimhasaladi7
      @narasimhasaladi7 หลายเดือนก่อน +4

      @@1littlecoder u should zoom the vedio a bit when u r reading a sentence and zoom out when u are done
      So that it makes easier for audience
      Thank you 🙂

    • @1littlecoder
      @1littlecoder  หลายเดือนก่อน +4

      Makes sense, Thank you for the tip!

  • @unclecode
    @unclecode หลายเดือนก่อน

    I feel you brother, I am happy too! Two things make me really happy. First, the mamba. It's great we're not trapped by one technique dominating over others. It's worse when one concept pushes everything else aside, not just in computer science, but in human history, politics, etc. It's refreshing to see we're not just using transformers. Second, it's beautiful to see a company not obsessed with benchmark rivalry. You release something to open source after primary tests, let the community improve it, isn't it amazing? It always have been like this! But recently some companies spend months trying to beat benchmarks before releasing anything, often missing the point. Sometimes I think this benchmark hype is driven by proprietary companies to distract open source groups because open source is the real competitor. So, these two aspects of the new model are beautiful, and I'm very happy, just like you.

  • @Nexus-zc3cb
    @Nexus-zc3cb หลายเดือนก่อน +6

    Mamba isn't as great as you boys think... rwkv 6 is still a better arch than mamba 2.
    Mamba 2 is just a worse architecture. In terms of the time mixing, the main difference is the use of discretisation which even the authors mention that they don’t think it’s necessary. The other difference is that rwkv uses vector decay and mamba 2 uses scalar which has been proven to be worse and basically has to be worse. The main thing mamba 2 has is a very good cuda kernel which is obviously because it’s tri dao But in terms of architecture (specifically time mixing recurrence) it’s worse
    For example the best version of rwkv possible that’s still quick and doesn’t need cuda is using scalar data dependent decay, basically mamba 2. If mamba 2 is better than it’s gonna be something else like hyperparams or maybe the channel mixing and convolution. Also I would be suspicious of their mqa results as attention should get near 100% for any model dimension
    I guess tho maybe that’s just mqar being less reliable and more sensitive to hyperparams
    Also mamba2 is using much larger state to compensate for the simplified design.
    Most creators doesn't give enough attention rwkv 6. hope you look into it, there is a discord server for rwkv to get latest updates (there is a 14B model cooking) and latest rwkv 6 weights can be found in RWKV-LM repository.

    • @vaitesh
      @vaitesh หลายเดือนก่อน

      That's a valid good point

    • @1littlecoder
      @1littlecoder  หลายเดือนก่อน +1

      I covered v5 of RWKV. th-cam.com/video/gHdRgfmAVIw/w-d-xo.htmlsi=8Lslj9FnZlZTMD7V
      Missed the v6. Thanks for informing. I'll look into it.

    • @Nexus-zc3cb
      @Nexus-zc3cb หลายเดือนก่อน +1

      ​@@1littlecoder seems like the paper which you used in the video is rwkv-4. Paper for RWKV v5 and v6 are under the name of "eagle and finch: RWKV with matrix-valued states and dynamic recurrence"

    • @Nexus-zc3cb
      @Nexus-zc3cb หลายเดือนก่อน

      ​​@@1littlecodernote that RWKV 6 models aren't instruct tuned at all... They are semi tuned, they have seen a lot of chatgpt data in their pretraining dataset... So expect more performance when instruct tuned..

    • @Nexus-zc3cb
      @Nexus-zc3cb หลายเดือนก่อน

      ​​@@1littlecodera funny detail is, in mamba 2 paper they acknowledge v6 existence but only compare against v4... Because they know it's better... All comparsions are only against v4 or v5

  • @haileycollet4147
    @haileycollet4147 หลายเดือนก่อน

    Since this isn't using Hydra (it's using vanilla Mamba 2), the order of prompt will be important. I.e. you'll want to put code before question. Or even better, repeat it: code, question, code, question.
    See the paper Just Read Twice

  • @PankajDoharey
    @PankajDoharey หลายเดือนก่อน

    Mind blowing.

  • @geekyprogrammer4831
    @geekyprogrammer4831 หลายเดือนก่อน +1

    waiting for its Ollama release 😅

  • @marcfruchtman9473
    @marcfruchtman9473 หลายเดือนก่อน

    Sounds great, tho I would like to see better (HumanEval coding) benchmark. Looking forward to the testing.

  • @NLPprompter
    @NLPprompter หลายเดือนก่อน +1

    so i got to do is CLI chant venv, ANACONDA, MAMBA.... then my coding magic will go sssssshhhhhhhhaaaaaaaaaa

  • @KevinKreger
    @KevinKreger หลายเดือนก่อน

    🥰

  • @MichaelBarry-gz9xl
    @MichaelBarry-gz9xl หลายเดือนก่อน

    Let's say it takes 30 mins to load a repository into context, can I chat with it in real time? Or would it have to reload the entire repository for each response?

  • @testales
    @testales หลายเดือนก่อน

    But can it -run Crysis- create a Snake game? :)

  • @mariusz0kreft
    @mariusz0kreft หลายเดือนก่อน +2

    I am wondering when it will come out as llamafile

    • @1littlecoder
      @1littlecoder  หลายเดือนก่อน +1

      Waiting for the same . Btw are you using llamafile ?

    • @mirek190
      @mirek190 หลายเดือนก่อน

      When llamacpp will implement mamba 2 ...

  • @DistortedV12
    @DistortedV12 หลายเดือนก่อน +1

    Deepseek v2 still eats this for lunch

  • @KumR
    @KumR หลายเดือนก่อน +1

    Next model will be MambaZHAM ;)

    • @eleice1
      @eleice1 หลายเดือนก่อน

      What's dat?

    • @1littlecoder
      @1littlecoder  หลายเดือนก่อน

      Mambazham means Mango in Tamil (the language me and the OP know) 😆

  • @yadav-r
    @yadav-r หลายเดือนก่อน

    What is the difference between Transformers and Mamba, for an end user. Do we need to really know them.

    • @mirek190
      @mirek190 หลายเดือนก่อน +1

      Difference ?
      In theory mamba can run on bytes instead of tokens and short memory is much bigger ...

    • @yadav-r
      @yadav-r หลายเดือนก่อน

      @@mirek190 thankyou 😀

  • @ravishmahajan9314
    @ravishmahajan9314 หลายเดือนก่อน +1

    What is the use case?? I guess it's just programming.

    • @combardus9309
      @combardus9309 หลายเดือนก่อน

      Code generation and refactoring, you can use it where you would probably need code generated dynamically, let's say you want to create a software program that dynamically creates code for the type of input you send in to the model, mamba can do that better than the non fine tuned models can. it supports all the main programming languages as well