ความคิดเห็น •

  • @six1free
    @six1free 23 วันที่ผ่านมา +1

    hands down one of the best youtube chanels out there - and i'm not just saying that for flashing my question :D I really do love how thoroughly you've taken to answering it.
    .. this being the pause point... I'm going to guess that cuda will do it all for you ("as if" - I'm sure :D)
    I am so envious of your test rig... as it is though I need a data center for power... as for adding the other cards, further research tensors and rewatch this video when applicable :D - downloaded and saved to my good tutorials (very long) playlist... enjoy the well deserved follow-through.

    • @RoboTFAI
      @RoboTFAI 21 วันที่ผ่านมา +1

      Thanks for the idea!

  • @jackflash6377
    @jackflash6377 19 วันที่ผ่านมา +1

    Outstanding !
    Glad I found this channel.
    Thank you sir.

    • @RoboTFAI
      @RoboTFAI 18 วันที่ผ่านมา

      Thanks for watching!

  • @246rs246
    @246rs246 22 วันที่ผ่านมา +2

    I'm blown away by this comprehensive answer to my question. Thumbs up and I'm looking forward to more interesting videos.

    • @RoboTFAI
      @RoboTFAI 21 วันที่ผ่านมา +1

      Awesome, thank you!

  • @kevinclark1466
    @kevinclark1466 7 วันที่ผ่านมา +1

    Great video! Looking forward to trying this…

    • @RoboTFAI
      @RoboTFAI 6 วันที่ผ่านมา

      Have fun!

  • @SphereNZ
    @SphereNZ 17 วันที่ผ่านมา

    Great video, great info, really appreciate it, thanks.

    • @RoboTFAI
      @RoboTFAI 16 วันที่ผ่านมา

      Appreciated!

  • @AkhilBehl
    @AkhilBehl 23 วันที่ผ่านมา +3

    This is absolutely awesome stuff.

    • @RoboTFAI
      @RoboTFAI 21 วันที่ผ่านมา

      Thanks!

  • @CoderJon
    @CoderJon 8 วันที่ผ่านมา

    Love your videos. I appreciate that you leave the interpretation of the results to us, but I would love a video talking about your interpretations of the data. For example: Why your results for Prompt tokens per second were higher with the 90/10 split. I can assume its because there is some sort of parallel processing happening on the interpretation of the prompt, but I am still new to the AI world so would love the education.

    • @RoboTFAI
      @RoboTFAI 8 วันที่ผ่านมา

      Much appreciated! I attempt to keep my mouth shut and let the data show the info. Definitely not an expert and just learning like everyone else. I never intended on creating an actual channel, the first video was to prove a conversation with friends out with hard data, the testing app is for other uses in my lab, etc. Just turning into a place where we can all share some data and learn from it, or at least burn some of my power bill together!

  • @andre-le-bone-aparte
    @andre-le-bone-aparte 21 วันที่ผ่านมา +1

    Question: @03:14 - NVTOP is showing - 90+ Degrees (86 on the M40) Fahrenheit on each of those cards... WITHOUT any active usage?
    - That seems excessive. Currently running a 4x3090 setup at 79-degrees or lower, in-between queries.

    • @RoboTFAI
      @RoboTFAI 20 วันที่ผ่านมา +1

      the 4060's are stacked with each other on the bench node in this test (I don't recommend that, they could use space between them since side facing fans, and why I use a lot of pcie extenders normally) and don't run their fans unless there is a load - the M40 in this test has an active fan on all the time. Also I live in a hot climate and it's been 85-100 degrees (75+ in the workshop as it's not conditioned)🔥

    • @andre-le-bone-aparte
      @andre-le-bone-aparte 20 วันที่ผ่านมา +2

      @@RoboTFAI 👍- Just looking to learn ways to extend the life of these GPUs and increase performance for LLM usage when running 10 hours a day (work day, remote-work, as a code assistant)

  • @mbike314
    @mbike314 11 วันที่ผ่านมา

    Thank you for creating this valuable content. I am pleased to have discovered it. I am interested in some 4060's you mentioned. I sent an email.
    Please keep going with this channel!
    Wonderful stuff!

    • @RoboTFAI
      @RoboTFAI 7 วันที่ผ่านมา

      Thanks a ton! Didn't see any email - reach out robot@robotf.ai or ping me on reddit/etc

    • @mbike314
      @mbike314 วันที่ผ่านมา

      Thank you. I did send it to the wrong address. Just resent it to the correct address.

  • @tbranch227
    @tbranch227 11 วันที่ผ่านมา

    Can you run a larger model when you span cards? Or does your model need to be able to fit on each card that you tensor split across? What happens to performance then, if you can run larger models by aggregating card ram?

    • @RoboTFAI
      @RoboTFAI 9 วันที่ผ่านมา

      You can absolutely span the larger model between cards! These tests are actually doing that, performance depends on cards you are splitting between - but will be between your lowest end card, and highest end cards (if different models). Running multiple cards doesn't necessarily increase performance, it's really for expanding your VRAM capacity.

  • @tsclly2377
    @tsclly2377 22 วันที่ผ่านมา

    I think loading is still an important factor, so do you use NVMe drives, like the large, high write level Octane p900 series for the fast load? and FPGAs for pre-setting data (like video, pictures) reconstructed in a faster use mode?

    • @RoboTFAI
      @RoboTFAI 21 วันที่ผ่านมา

      I normally leave the unloaded model test off as it doesn't allow as much resolution in the smaller charts. I use Gen 4 NVMe M.2 drives in each of these systems (rated up to 5000/4800 MB/s...yea right).

    • @Zeroduckies
      @Zeroduckies 15 วันที่ผ่านมา

      Or you can get 1tb ram and have 500gb ramdisk ^^

    • @tsclly2377
      @tsclly2377 13 วันที่ผ่านมา

      @@Zeroduckies Using HP ML 350p machines one only gets up to 768GB of dram that has to be LRdram, but that ram is running on three channels that actually slows it down from the 2 channel 256GB because of the required 'blocking in' and processing. It is all in the specification PDF from HP.. It is only when going to The G11 model that one actually get significantly faster (PCIe 5.0.. HP skipped the 4.0 architecture in these machines) ram and a larger capacity at a astronomically increase in price.. So when getting a 'loaded' 256 dram ML 350p G8 afor a trade of on older gamer machine with a at GTX 1660ti and a less than tenth geni7 (about a 300$ value) one must be looking for a fast economical memory solution and that is where the Optane P900 card come in (with their 4000GB/s bust) and one must also compare that at the rate that the GPU actually can take in, so this is a cheap way to run data in (and out) in a comparable manner as dram... plus you are only occupying a PCIe 4 lane. Now this is al gfine and dandy, but in dual cpu chip-sets, the PCIe lanes go all over the place and that is a major consideration as the right and left side are controlled by different CPUs and SLI or VRLinking can be required for OS recondition of the linked GPU cards that is inherently required for proper function logging.... and PCIe controllers on these machines. They are going to be slower than single CPU specifically designed mother boards that are made other companies such as the multi-PCIe 16x SuperMicro or Gigabyte professions models... that have come out specifically designed for this type of application that use NVME arrays for storage.. and then you are back to the amount of writes that are going to be applied to the storage.