HuggingFace GPT-J: Usage and Fine-tuning(Update in description)

แชร์
ฝัง
  • เผยแพร่เมื่อ 17 ต.ค. 2024

ความคิดเห็น • 67

  • @Brillibits
    @Brillibits  2 ปีที่แล้ว +2

    Discord: discord.gg/F7pjXfVJwZ

  • @charles-spurgeon
    @charles-spurgeon 2 ปีที่แล้ว +2

    Excellent video as usual. Thank you, Blake!

    • @Brillibits
      @Brillibits  2 ปีที่แล้ว

      Thanks for watching!

  • @hopps4117
    @hopps4117 ปีที่แล้ว +1

    Thank you so much for this video. Do you know if this repo is compatable to run on windows?

    • @Brillibits
      @Brillibits  ปีที่แล้ว +1

      I do not know. I recommend dual booting with Linux to get GPU docker support to save yourself from many headaches.

  • @1995sharaku
    @1995sharaku 2 ปีที่แล้ว +1

    hey! i hope you can give me an answer, what would be the minimun requirements for a system (RAM and GPU VRAM) to finetune gpt-J-6 ? From the video I could tell that 100% of your 128Gb RAM was used in finetuning. Would a System with 128GB RAM and a P100 or V100 (not sure how much VRAM they have) be sufficient for finetuning? My university has a Hardware Cluster available for their students and they have such P100 and V100 GPUS there but im not sure if this would be enough to fine tune on a dataset with roughly 120 datapoints

    • @Brillibits
      @Brillibits  2 ปีที่แล้ว

      Hard to say what the minimum VRAM is without trying it. 24GB works and I would doubt much less than that would work.

  • @TheAIEpiphany
    @TheAIEpiphany 2 ปีที่แล้ว

    Hey Blake what are your machine specs? :))

    • @Brillibits
      @Brillibits  2 ปีที่แล้ว

      Currently 128 GB ddr4 ram. Ryzen 5950x. 2 3090s and one 1060.

    • @TheAIEpiphany
      @TheAIEpiphany 2 ปีที่แล้ว

      @@Brillibits Nice!! Is it self funded or?

    • @Brillibits
      @Brillibits  2 ปีที่แล้ว +1

      Yes my channel and things used for it are entirely self funded.

    • @TheAIEpiphany
      @TheAIEpiphany 2 ปีที่แล้ว

      @@Brillibits Thanks! Keep it up!

    • @Brillibits
      @Brillibits  2 ปีที่แล้ว

      @@TheAIEpiphany thanks!

  • @MrRedwizard000
    @MrRedwizard000 ปีที่แล้ว

    Great video, thanks.
    I don't have a video card that's that good, so I tried it on my HPE Gen9 server. Gave it 30 CPUs (4 physical Xeons, 3.6ghz) and it took 1 minute and 4 seconds to generate a result with your test input.

    • @Brillibits
      @Brillibits  ปีที่แล้ว +1

      Thanks for sharing!

    • @halo64654
      @halo64654 ปีที่แล้ว

      I'll be doing it on my HPE Gen7 With x2 X5650s. Mine should be slower, but I'll post my times if I can get it running.

  • @codegate615
    @codegate615 ปีที่แล้ว

    Im having a lot of trouble with the packages , has the command changed for the first line of the jupyter notebooks?

    • @Brillibits
      @Brillibits  ปีที่แล้ว

      Yes it has! Make a PR or I will try to update it when I can. Google "Pytorch getting started" and use the tool they have. Cuda 11.6 should be good.

  • @thanhlongluong6665
    @thanhlongluong6665 2 ปีที่แล้ว +1

    hi i got the error : "Output 0 of DequantizeAndLinearBackward is a view and is being modified inplace. This view was created inside a custom Function (or because an input was returned as-is) and the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is forbidden. You can fix this by cloning the output of the custom Function." when run your colab, pls tell me how to fix it. Thank you

    • @maywire
      @maywire ปีที่แล้ว

      you add .clone() at one of your functions' exit, I think it's F.linear(...).clone() or something similar if I recall well

  • @Fourgees_4GS
    @Fourgees_4GS 2 ปีที่แล้ว

    Are there any cloud based platforms for training you recommend rather than buying a 3090 outright? Thanks for this really informative video.

    • @Brillibits
      @Brillibits  2 ปีที่แล้ว +1

      Thanks for watching. I have used free Azure credits in the past to do this.

    • @ikhlask7844
      @ikhlask7844 2 ปีที่แล้ว

      @@Brillibits how about google colab

  • @davidakinmade3523
    @davidakinmade3523 2 ปีที่แล้ว

    Quick question blake, can the gpt-j slim version be downloaded and used for instances of few-shot learning?

    • @Brillibits
      @Brillibits  2 ปีที่แล้ว

      Yes. In all my previous videos with GPTJ I have been using the old method of converting the weights manually. I have a video of myself then using that model for fewshot learning.

    • @davidakinmade3523
      @davidakinmade3523 2 ปีที่แล้ว

      @@Brillibits thanks man. I’m currently working on a project where I’m exploring the application of few-shot learning to generate datasets. I’m trying to setup the environment for the project but my system is a 16gb ram windows with 2gb nvidia graphics card. I’m not certain I’d be able to do the few-shot learning on my system. Or you reckon I could manage it since it’s not fine-tuning?

    • @Brillibits
      @Brillibits  2 ปีที่แล้ว

      To just load the model and use it, you will need a significantly better system than that for GPTJ. You need roughly 32GB of RAM with swap to run it VERY slowly on the CPU and need a GPU with 16+ GB of VRAM to run the model with any speed.

  • @borianmoldfach8902
    @borianmoldfach8902 2 ปีที่แล้ว

    Thanks man. I find this stuff very interesting! I want to try this out when I have a better gpu.

    • @Brillibits
      @Brillibits  2 ปีที่แล้ว

      I find it really interesting too! Grateful I get to work with this stuff.

  • @michaellin6155
    @michaellin6155 2 ปีที่แล้ว

    Hi, when I'm trying the script of gpt-j-6B.ipynb I got an error "RuntimeError: probability tensor contains either `inf`, `nan` or element < 0” with RTX3090 and RAM of 128G. Any reply would help. thx

    • @Brillibits
      @Brillibits  2 ปีที่แล้ว

      What part of the program?

  • @J0SHUAKANE
    @J0SHUAKANE ปีที่แล้ว

    At the very end when I give it an input I get an error "AssertionError: Torch not compiled with CUDA enabled". I am using a 4090 & google searches have not helped. everything else went super smooth.

    • @Brillibits
      @Brillibits  ปีที่แล้ว

      This may have to do with how new the rtx 4090 is and the lack of broad support for many popular applications. You can try compiling pytorch from source(I have a video doing that for rtx 3090 when it first came out). Or you try looking at latest video that goes over how to finetune these models using docker

  • @Vissou
    @Vissou 2 ปีที่แล้ว +1

    I have the following error : AttributeError : module jaxlib.xla_client has no attribute get_local_backend
    Any idea how to solve this ?

    • @Brillibits
      @Brillibits  2 ปีที่แล้ว

      Where are you getting this error? Which repo? Which line?

    • @Vissou
      @Vissou 2 ปีที่แล้ว

      @@Brillibits First Error : in your branch « original_youtube » at the cell where we convert to torch weights (the 4th one) when executing the line « params = reshard(params, old_shape).squeeze(0).T » i actually get this « AttributeError »
      Second Error : For the notebook in the main branch (this actual video) i get « ImportError : cannot import name GPTJForCausalLM from transformers (unknown location) » at the cell where we simply import every library we need.
      In both notebooks i was able to install every packages and libraries without any problem (i created the conda environment like you did).
      About the Second Error, i was able to make it work the first time, but then i get this importError every time even when recreating the conda environment. When it was working the first time, I got an error at the next cell when executing GPTJForCausalLM.from_pretrained(…) but i forgot what it was (may be the same as the First Error)

    • @Vissou
      @Vissou 2 ปีที่แล้ว +2

      @@Brillibits i fixed the two errors by updating to jax 0.2.25.
      Actually I also had to update to transformers 4.12.5 and downgrade to tensorflow 2.5.0 to run the notebook on your main branch

  • @kopasz777
    @kopasz777 2 ปีที่แล้ว +3

    I wish I had the HW to fine tune a model like this.

    • @Brillibits
      @Brillibits  2 ปีที่แล้ว

      Yeah it is expensive. I am grateful I was in a position to make the investment.

  • @carthagely122
    @carthagely122 ปีที่แล้ว +1

    Thanks

    • @Brillibits
      @Brillibits  ปีที่แล้ว

      Thanks for watching!

  • @aCloudOfHaze
    @aCloudOfHaze 2 ปีที่แล้ว

    GOLD! Thank you

  • @rOxinhoPKK
    @rOxinhoPKK ปีที่แล้ว +1

    It would be cool if you could make a tutorial on how to train GPT-J/Neo to understand other languages like spanish or italian with custom datasets, I've seen a few posts on it but they were all years old and for GPT-2 only. Thanks for the videos

    • @Brillibits
      @Brillibits  ปีที่แล้ว +1

      I have a video on creating a dataset. The idea for making a dataset for other languages is likely very similar

  • @dmytropetrovskiy2017
    @dmytropetrovskiy2017 2 ปีที่แล้ว

    Thank you

  • @FoodVentures2023
    @FoodVentures2023 2 ปีที่แล้ว +1

    how to train it on TPU please? it took 5 hours on dual K80 on azure

    • @Brillibits
      @Brillibits  2 ปีที่แล้ว +1

      Didn't think K80 had enough Vram. If and when I make money from TH-cam, I will start making videos that cost money, such as renting a TPU.

  • @vmfox2152
    @vmfox2152 ปีที่แล้ว

    My GPT-J says you are a vegan 😆 ...interesting. Thanks for the video. It helped me a lot 👍

    • @Brillibits
      @Brillibits  ปีที่แล้ว

      Glad it was helpful!

  • @akejron1
    @akejron1 2 ปีที่แล้ว +1

    Is discord channel still a thing? I wanna join.

    • @Brillibits
      @Brillibits  2 ปีที่แล้ว

      Yes! Forgot to pin.
      Discord: discord.gg/F7pjXfVJwZ

  • @ansh6848
    @ansh6848 2 ปีที่แล้ว

    Can we use the same tutorial for code generation with GPT-J?

    • @Brillibits
      @Brillibits  2 ปีที่แล้ว

      Yes

    • @ansh6848
      @ansh6848 2 ปีที่แล้ว

      @@Brillibits Thanks 👍

  • @machinepola6246
    @machinepola6246 2 ปีที่แล้ว

    What is your system configuration?

    • @Brillibits
      @Brillibits  2 ปีที่แล้ว +1

      Ryzen 9 5950X, 128 GB of RAM, RTX 3060 and RTX 3090. Only ones that really matter for the video are the RAM and 3090

  • @sallu.mandya1995
    @sallu.mandya1995 2 ปีที่แล้ว

    How can i fine tune gptj using 2 tesla t4 gpus??

    • @Brillibits
      @Brillibits  2 ปีที่แล้ว

      Thats 16GB of VRAM right? I don't think its possible to do so with the amount of VRAM using the methods I go over in my videos.

    • @sallu.mandya1995
      @sallu.mandya1995 2 ปีที่แล้ว

      I have one a100. Could you please suggest me a way through.

    • @sallu.mandya1995
      @sallu.mandya1995 2 ปีที่แล้ว

      RuntimeError: CUDA out of memory. Tried to allocate 1.88 GiB (GPU 0; 39.59 GiB total capacity; 36.94 GiB already allocated; 638.19 MiB free; 36.98 GiB reserved in total by PyTorch)

  • @shoebmoin10
    @shoebmoin10 ปีที่แล้ว

    Can you make a video on finetuning using sagemaker and hugging face Trainer

    • @Brillibits
      @Brillibits  ปีที่แล้ว

      Potentially. Thanks for the suggestion.

  • @halo64654
    @halo64654 ปีที่แล้ว

    Can this be done in Windows?

    • @Brillibits
      @Brillibits  ปีที่แล้ว

      Maybe. I have not tested.

  • @arsalanarsalan1098
    @arsalanarsalan1098 2 ปีที่แล้ว

    great

    • @Brillibits
      @Brillibits  2 ปีที่แล้ว

      Thanks for watching!