LangChain + Falcon-40-B-Instruct, #1 Open LLM on RunPod with TGI - Easy Step-by-Step Guide

แชร์
ฝัง
  • เผยแพร่เมื่อ 2 ธ.ค. 2024

ความคิดเห็น • 46

  • @lukebarbier3487
    @lukebarbier3487 ปีที่แล้ว +1

    Thank you for putting this together. Very helpful!

  • @8eck
    @8eck ปีที่แล้ว +1

    Thanks for sharing, subbed and liked.

  • @Leonid.Shamis
    @Leonid.Shamis ปีที่แล้ว +1

    Thank you, very informative and very helpful video!

  • @edoardog1899
    @edoardog1899 ปีที่แล้ว +1

    Excellent, I Will try It !

  • @chinmaydeshpande5046
    @chinmaydeshpande5046 6 หลายเดือนก่อน

    Great video !!!!!

  • @MagagnaJayzxui
    @MagagnaJayzxui ปีที่แล้ว

    Thank You! Very helpful! Thanks again!

  • @SingularitySurfers
    @SingularitySurfers ปีที่แล้ว +1

    Thank you!

  • @danielmz99
    @danielmz99 ปีที่แล้ว

    Thanks for sharing. It would be very interesting to see a video doing fine tuning for Falcon-40b for a complex task ( let's say something you need Chain of Thoughts in ChatGPT). I haven't seen a video like that anywhere. Thanks for sharing

  • @rafaelmartinsdecastro7641
    @rafaelmartinsdecastro7641 ปีที่แล้ว

    Good stuff

  • @yurizappa
    @yurizappa ปีที่แล้ว

    Indeed very helpful.

  • @scorpionrevenge
    @scorpionrevenge ปีที่แล้ว

    Why did it use up all my google colab disk space? any ideas on how to keep the colab disk free? Did you create an SSH tunnel for port forwarding - can you share instructions on how to do that?

  • @abuiliazeed
    @abuiliazeed ปีที่แล้ว +1

    Amazing!! That's exactly what I was looking for,
    If I want to use AWS instead of Runpod what AWS architecture/Service do you recommend?

    • @experienced-dev
      @experienced-dev  ปีที่แล้ว

      Begin by using the 'Deploy' button at huggingface.co/tiiuae/falcon-40b-instruct with SageMaker.

  • @hocklintai3391
    @hocklintai3391 ปีที่แล้ว

    I tried out the codes, but got an error when running this command: display(Markdown(chain.run({"num_trees": 100, "num_apples": 10}))). The error is: JSONDecodeError: Expecting value: line 1 column 1 (char 0)

  • @avbendre
    @avbendre ปีที่แล้ว

    what is the context window for the LLM

  • @khavea
    @khavea ปีที่แล้ว +1

    Thanks for the details. how long did it take to download the model?

    • @experienced-dev
      @experienced-dev  ปีที่แล้ว

      ~15 minutes
      Timestamps are in the logs 2:53 4:39

  • @adriangabriel3219
    @adriangabriel3219 ปีที่แล้ว +1

    Could you make a video on how to train the Falcon-40b?

    • @experienced-dev
      @experienced-dev  ปีที่แล้ว

      The Falcon-40B was trained on 384 A100 40GB GPUs.
      If you're referring to fine-tuning with LoRa/QLoRa, could you please suggest the dataset?

    • @adriangabriel3219
      @adriangabriel3219 ปีที่แล้ว

      Hi @@experienced-dev thank's for your clarification that's exactly what I meant. What aobut the MedQad dataset to be able to see how well it picks up on a new domain?

    • @experienced-dev
      @experienced-dev  ปีที่แล้ว

      @adriangabriel3219,
      Falcon instruct models were trained on baize, which includes MedQuAD.
      github.com/project-baize/baize-chatbot#overview

  • @meworlds8216
    @meworlds8216 ปีที่แล้ว

    Good job very nice video!

  • @Star-rd9eg
    @Star-rd9eg ปีที่แล้ว

    Hey man! Can you assist me with something like this for serverless? I am willing to pay

  • @8eck
    @8eck ปีที่แล้ว

    I remember that Falcon models were terribly slow. Is it still the problem?

  • @scorpionrevenge
    @scorpionrevenge ปีที่แล้ว

    what template of runpod is it running on? Any instructions on initializing runpod instance?

    • @experienced-dev
      @experienced-dev  ปีที่แล้ว

      In this tutorial, I am using RunPod Python API directly without templates.
      I would recommend joining RunPod discord discord.gg/cUpRmau42V and looking at #gpu-support

  • @LakshaySaini-k5p
    @LakshaySaini-k5p ปีที่แล้ว

    running -> display(Markdown(chain.run({"num_trees": 100, "num_apples": 10})))
    facing -> JSONDecodeError: Expecting value: line 1 column 1 (char 0)
    Performed exact steps as defined in the notebook you shared. Any Idea?
    If possible, can you share your requirement.txt?

    • @experienced-dev
      @experienced-dev  ปีที่แล้ว

      Have you waited for the weights to download? It usually takes roughly 15 minutes.
      Can you please share container logs from the pod?

    • @LakshaySaini-k5p
      @LakshaySaini-k5p ปีที่แล้ว

      @@experienced-dev Thanks for the reply. Weights have been downloaded, but facing the following issue:
      2023-06-21T09:30:25.008304234Z 2023-06-21T09:30:25.008136Z ERROR text_generation_launcher: Shard 1 failed to start:
      2023-06-21T09:30:25.008330693Z [W socket.cpp:601] [c10d] The client socket has failed to connect to [localhost]:29500 (errno: 99 - Cannot assign requested address).
      2023-06-21T09:30:25.008333203Z You are using a model of type RefinedWeb to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
      2023-06-21T09:30:25.008335393Z [E ProcessGroupNCCL.cpp:828] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, Timeout(ms)=60000) ran for 67910 milliseconds before timing out.
      2023-06-21T09:30:25.008337463Z [E ProcessGroupNCCL.cpp:455] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
      2023-06-21T09:30:25.008338723Z [E ProcessGroupNCCL.cpp:460] To avoid data inconsistency, we are taking the entire process down.
      2023-06-21T09:30:25.008340373Z terminate called after throwing an instance of 'std::runtime_error'
      2023-06-21T09:30:25.008341973Z what(): [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, Timeout(ms)=60000) ran for 67910 milliseconds before timing out.
      2023-06-21T09:30:25.008351233Z
      PS: Couldn't post a link, don't know why

    • @experienced-dev
      @experienced-dev  ปีที่แล้ว

      @user-uw3zp2yh7k @shivamattoo2970
      I am trying to figure out what is wrong with RunPod
      discord.com/channels/912829806415085598/1023588055174611027/1121051392241573938

    • @LakshaySaini-k5p
      @LakshaySaini-k5p ปีที่แล้ว

      @@experienced-dev Unable to join the server, can you please send me the invitation? lakshay.saini-jupitice

    • @experienced-dev
      @experienced-dev  ปีที่แล้ว

      @@LakshaySaini-k5p discord.gg/cUpRmau42V

  • @efexzium
    @efexzium ปีที่แล้ว

    inference API token feature sucks

  • @GirijaCk-gg1ty
    @GirijaCk-gg1ty ปีที่แล้ว

    can I use runpod for free???😂

  • @kevinyang2076
    @kevinyang2076 ปีที่แล้ว +1

    using your config but replaced with SECURE cloud, my runpod instance went to endless loop of ERROR text_generation_launcher: Shard 1 failed to start:
    2023-06-14T18:05:10.601398916-07:00 You are using a model of type RefinedWeb to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
    2023-06-14T18:05:10.601406039-07:00 [E ProcessGroupNCCL.cpp:828] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, Timeout(ms)=60000) ran for 68173 milliseconds before timing out.
    2023-06-14T18:05:10.601410789-07:00 [E ProcessGroupNCCL.cpp:455] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
    2023-06-14T18:05:10.601417843-07:00 [E ProcessGroupNCCL.cpp:460] To avoid data inconsistency, we are taking the entire process down.
    2023-06-14T18:05:10.601423500-07:00 terminate called after throwing an instance of 'std::runtime_error'
    2023-06-14T18:05:10.601430135-07:00 what(): [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, Timeout(ms)=60000) ran for 68173 milliseconds before timing out.
    Yours booted in ~40 seconds (4:39), I wonder if the source of error is the timeout being set too close and some variations may exceed this limit

    • @experienced-dev
      @experienced-dev  ปีที่แล้ว

      Initial boot took approximately 15 minutes, 2:53 - 4:39. Many steps are required to make this production-ready.

    • @kevinyang2076
      @kevinyang2076 ปีที่แล้ว +1

      @@experienced-dev could you elaborate on how did you achieve this (Many steps are required to make this production-ready.)?
      If you pay attention to the container log shown on 4:39, you can see "Shard 1 Ready in 41.39s" which is less than the timeout=60000ms. Have you manually make docker_args in runpod.create_pod() (2:37) explicit for (--gpus all) and larger shard memory (--shm-size)? When I cloned your colab and ran it using my runpod API key, the above error happened.
      Hope this clarifies my question!

    • @mahmoudsamir9537
      @mahmoudsamir9537 ปีที่แล้ว

      @@kevinyang2076 I have the same problem

  • @adriangabriel3219
    @adriangabriel3219 ปีที่แล้ว

    I have tried your configuration with runpod and it's failing because of: '3-06-29T08:25:19.177454847Z 2023-06-29T08:25:19.177364Z INFO text_generation_launcher: Starting shard 1
    2023-06-29T08:25:29.236388920Z 2023-06-29T08:25:29.236201Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
    2023-06-29T08:26:33.429395037Z 2023-06-29T08:26:33.429106Z ERROR text_generation_launcher: Shard 1 failed to start:
    2023-06-29T08:26:33.429426266Z [W socket.cpp:601] [c10d] The client socket has failed to connect to [localhost]:29500 (errno: 99 - Cannot assign requested address).
    2023-06-29T08:26:33.429431256Z You are using a model of type RefinedWeb to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
    2023-06-29T08:26:33.429435506Z [E ProcessGroupNCCL.cpp:828] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, Timeout(ms)=60000) ran for 65716 milliseconds before timing out.
    2023-06-29T08:26:33.429454866Z [E ProcessGroupNCCL.cpp:455] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
    2023-06-29T08:26:33.429457906Z [E ProcessGroupNCCL.cpp:460] To avoid data inconsistency, we are taking the entire process down.
    2023-06-29T08:26:33.429461436Z terminate called after throwing an instance of 'std::runtime_error'
    2023-06-29T08:26:33.429464846Z what(): [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, Timeout(ms)=60000) ran for 65716 milliseconds before timing out.
    2023-06-29T08:26:33.429468146Z '
    Have you encountered this error before?

    • @experienced-dev
      @experienced-dev  ปีที่แล้ว

      Yes, I would recommend joining RunPod discord discord.gg/cUpRmau42V and looking at #gpu-cloud

    • @adriangabriel3219
      @adriangabriel3219 ปีที่แล้ว

      Great@@experienced-dev will do that? Do you have experience on how to improve inference time with the Falcon models?