Converting a LangChain App from OpenAI to OpenSource

แชร์
ฝัง
  • เผยแพร่เมื่อ 12 พ.ย. 2024

ความคิดเห็น • 69

  • @julian-fricker
    @julian-fricker ปีที่แล้ว +5

    This is exactly why I'm learning langchain and creating tools with data I don't care about for now. I know one day I'll flick a switch and have the ability to do all of this locally with open source tools and not worry about the security of my real data.
    This is the way!

    • @vhater2006
      @vhater2006 ปีที่แล้ว

      Good Luck on Privacy ;)

  • @Rems766
    @Rems766 ปีที่แล้ว +8

    Mate you are doing all the work I planned, for me. Thanks a lot.

  • @jarekmor
    @jarekmor ปีที่แล้ว +2

    Unique content and format. Practical examples. Something amazing! Don't stop making new videos.

  • @robxmccarthy
    @robxmccarthy ปีที่แล้ว +10

    Thank you so much for doing all of this work!
    Would be really interesting to compare the larger models. If GPT-3.5-Turbo is based on a 176b parameter model, it's going to be very difficult for a 13b model to stack up.
    13b models seem more appropriate for fine tuning, where the limited parameter count can be focused on specific context and domains - such as these texts and a QA structure for answering questions over the text. The example QA instructions and labels could be generated using OpenAI to ask questions over the text as in your first example.
    This is all very expensive and time consuming though... So I think you'd really need a real world business use case to justify the experimentation and development time required.

  • @tejaswi1995
    @tejaswi1995 ปีที่แล้ว

    The video I was most waiting for on your channel 🔥

  • @clray123
    @clray123 ปีที่แล้ว +3

    I think it will get interesting when people start tuning these open source models with QLoRa and some carefully designed task-specific datasets. If you browse through the chat-based datasets these models are pretrained with, there's a lot of crap in there, so no wonder the outputs are not amazing. I believe the jury is still out to what extent a smaller finetuned model could outperform a large general one on a highly specialized task. Although based on the benchmarks of the Guanaco model family, it seems that the raw model size also matters a lot.

    • @pubgkiller2903
      @pubgkiller2903 ปีที่แล้ว +2

      Biggest drawback is QLoRA will take a long time to generate the answer from Context

  • @PleaseOpenSourceAI
    @PleaseOpenSourceAI ปีที่แล้ว +3

    Great job, but these HF models are really large - even 7B ones take more than 12Gb of memory, so can't really run them on local cuda core. I'm almost at the point of beginning to try to figure out how to use GPTQ models for these purposes). It's been a month already and seems like no one is doing it for some reason. Do you know if there is some big obvious roadblock on this path?

  • @thewimo8298
    @thewimo8298 ปีที่แล้ว

    Thank you Sam! Appreciate the guide with the non-OpenAI LLMs!

  • @georgep.8478
    @georgep.8478 ปีที่แล้ว +1

    This is great. Please follow up on fine tuning a smaller model on the text and epub

  • @tensiondriven
    @tensiondriven ปีที่แล้ว +3

    This might be trivial; but I’d love a video on the difference between running a notebook and running a cli vs api. All the demos use notebooks, but to make it useful we need apis and cli!

    • @theh1ve
      @theh1ve ปีที่แล้ว +1

      I'd like to see this too. I want my model inferences running on one network machine and a GUI running on another with API calls.

  • @Borbby
    @Borbby ปีที่แล้ว

    Hello, thank you for the great work !
    I have a confusion about tokenizer and LLM, can they use the same model, like at 11:00 in the video, or can I use another model? Is there any difference between them?

  • @DaTruAndi
    @DaTruAndi ปีที่แล้ว +4

    Can you look into using the quantized models (GPTQ 4 bit or GGML 4.1) for example with langchain?

  • @yousif_12312
    @yousif_12312 ปีที่แล้ว +1

    Is it optimal to pass the user query to the retriever directly? Wouldn't asking the language model to decide what to search for (like using a tool) be better?
    Also, if 3 chunks in 1 doc were found, I wonder if its better to order them sequentially as they show up in the doc..

  • @ЕгорГуторов-р7я
    @ЕгорГуторов-р7я ปีที่แล้ว +1

    Thank you for such content. Is there any possibility to do the same without using only cloud-native platform and GPU? If I wanna launch smth similar on-premises with CPU?

  • @fv4466
    @fv4466 ปีที่แล้ว +3

    As a new comer, your discussion on the difference among models and prompt tuning is extremely helpful. Your video pins down the shortcoming of the current Retrieval-Augmented Language Modeling. It is very informative. Is any good way just to digest the html as raw? Is it always better to convert the html pages to text and following your process described in your video? Any tools do you recommend?

  • @reinerheiner1148
    @reinerheiner1148 ปีที่แล้ว +4

    I've really wondered how open source models would perform with langchain vs gpt 3.5 turbo so thanks for making that video. I suspected that the open source models would probably not perform as well but I did not think it would be that bad. Could you maybe provide us with a list of LLM's you tried that didnt work out, so we can cross them off our list of models to try for langchain?
    In any case thanks for making this notebook, it'll make it so much easier for me to mess around with open source models and langchain!

  • @acortis
    @acortis ปีที่แล้ว +1

    This was very helpful! Thanks so much for doing these videos. May I suggest that you do a video on the things that are needed to fine-tune some of the LLMs having a specific goal in mind? not sure that this is something that can be done on a colab, but knowing what are the steps and the required resources might be very helpful. Thanks again!

    • @samwitteveenai
      @samwitteveenai  ปีที่แล้ว +1

      I will certainly make some more fine-tuning vids. anything good examples you mean by "having a specific goal in mind"?

    • @acortis
      @acortis ปีที่แล้ว +1

      ​@@samwitteveenai I saw your video on fine-tuning with PEFT on the English quotes, and I thought the final result was a bit of a hit-and-miss. I was wondering what specific type of datasets would be needed for, say, reasoning or data extraction (a la squadv2). Overall, I have the sense that LLMs are trying to train on too much data (why in the world we are trying to get exact arithmetic is beyond me!). I think that it would be more efficient if there was a more specific model just dedicated to learning English grammar and then smaller, topic-specific, models. Just my gut feeling.

    • @samwitteveenai
      @samwitteveenai  ปีที่แล้ว +1

      @@acortis This is something I am working on a lot. The PEFT task was partially due to me not training it very long, it was just to give people something they could use to learn on. Reasoning is a task that normally requires bigger models etc. for few shot tasks. I am currently training models around 3B for very specific types of tasks around ReACT and PAL. I totally agree about the arithmetic etc. what I am interested in though is models that can do the PAL tasks etc. I have a video on that from about 2 months ago. I will make some more fine tuning content. I want to show QLoRA and some other cool stuff in PEFT as well.

  • @darshitmehta3768
    @darshitmehta3768 ปีที่แล้ว

    Hello Sam, Thank you for this amazing video.
    I am also facing issue for open source model like the same way in video. The open source models are giving answers them self if the data is not present in the PDF or chromadb. Are you having idea how can we achieve thing like openai for open source and which model we can use for that?

  • @henkhbit5748
    @henkhbit5748 ปีที่แล้ว +2

    Great video, love the comparison with open source. Would be Nice if u can show how to fine tune an os model, small model, with your own instruct dataset.
    BTW: how to add new embeddings in a currrent chroma DB? DB.sdd(....)?

  • @creativeuser9086
    @creativeuser9086 ปีที่แล้ว +1

    Fine tuning is hard. But RLHF is what takes the model to the next level and on par with the top commercial models. Wanna try to do it?

    • @samwitteveenai
      @samwitteveenai  ปีที่แล้ว +2

      RLHF isn't the panacea that most people make it out to be it. I have tried it for some things. I will make a video about it at some point.

    • @creativeuser9086
      @creativeuser9086 ปีที่แล้ว

      @@samwitteveenai I guess RLHF is hard to implement and is still in research territory.

  • @pranjuls-dt1sp
    @pranjuls-dt1sp ปีที่แล้ว +1

    Excellent stuff!!🔥🔥 Just curious to know, is there a way to extract unstructured information like invoice data extraction / receipt labels / medical bills info description etc. Using open source LLMs? Like using langchain + wizard/vicuna to perform such nlp tasks?

    • @samwitteveenai
      @samwitteveenai  ปีที่แล้ว +1

      you can try the Unstructed package or something like an open source OCR model

  • @rudy9546
    @rudy9546 ปีที่แล้ว +1

    Top tier content

  • @creativeuser9086
    @creativeuser9086 ปีที่แล้ว +1

    Could you please point me to a video you’ve done abouy how the embedding model works? Specifically, I want to know how does it transform a whole chunk of data (paragraph) into 1 embedding vector (instead of multiple vectors per token)?

  • @DaTruAndi
    @DaTruAndi ปีที่แล้ว +2

    Wouldn’t it make more sense to chunk tokenized sequences instead of the untokenized text? You don’t know the length of the tokenizations of each chunk, but maybe you should.
    Also handling of special sequences, like ### Assistant, would they be represented as special tokens? If so, handling them in the token space eg as additional stop tokens for the next answer may make sense.

    • @samwitteveenai
      @samwitteveenai  ปีที่แล้ว

      Yes, but honestly most the time it doesn't matter than much. The tokens way is a perfectly valid way to do it but here I was trying to keep it simple.
      You can use fancier ways for things like interviews. I have one project that has one set of docs that are financial interviews where I took the time to write a custom splitter for question / answer chunks and it certainly helps.
      Another challenge with custom open source models too are the different tokenizers. Eg. the original LLaMA models have a 32k vocab tokenizer but the fully open source ones are using 50K+ etc. So we want to make the indexes once but test them on multiple models. So in cases like this using token indexing doesn't always help too much.
      Often the key thing is to have a good overlap size and that should be tested

  • @cdgaeteM
    @cdgaeteM ปีที่แล้ว +1

    Thanks, Sam; your channel is great! I have developed a couple of APIs. Gorilla seems to be very interesting. I would love to hear your opinion through a video. Best!

    • @samwitteveenai
      @samwitteveenai  ปีที่แล้ว +3

      Yes Gorilla does seem interesting I read the abstract a few days ago, need to go back and check it out properly. Thanks for reminding me!

  • @dhruvilshah9881
    @dhruvilshah9881 ปีที่แล้ว

    Hi, Sam. Thank you for all the videos - I have been with you from the first video. Learned so much from these tutorials. Can you create a video on Fine Tuning LLaMA/Alpaca/ VertexAI(text-bison) or any other feasible LLM for retrieval purposes? Retrieval purposes could be - 1) Asking something about the private data (in GBs/TBs) on local repository. 2) Extracting some specific information from the local data.

    • @samwitteveenai
      @samwitteveenai  ปีที่แล้ว

      Thanks for being around from the start :D. I want to get back more in to showing Fine-tuning especially now the truly open LLaMA models are out. I try to show something that people can run in Colab so probably won't do TBs of data. Do you have any suggested datasets I could use?

  • @bobchelios9961
    @bobchelios9961 ปีที่แล้ว

    i would love some information on the RAG models you mentioned near the end

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w ปีที่แล้ว +1

    Which LLM is instruct embeddings compatible with? Is it a common standard?

    • @samwitteveenai
      @samwitteveenai  ปีที่แล้ว +2

      It will work with any LLM you use for conversational part. Embedding models are independent of the conversation LLM, they are for retrieval.

  • @ygshkmr123
    @ygshkmr123 ปีที่แล้ว +1

    Hey Sam, Do you have any idea how can reduce inference time on open-source LLM model

    • @samwitteveenai
      @samwitteveenai  ปีที่แล้ว

      Multiple GPUs, Quantization, Flash attention and other hacks. I am thinking about doing a video about this . Any particular model you are using ?

  • @kumargaurav2170
    @kumargaurav2170 ปีที่แล้ว

    The kind of understanding wrt what user is exactly looking out for is currently best performed by OpenAI & PaLM APIs between all the hype.

    • @samwitteveenai
      @samwitteveenai  ปีที่แล้ว +2

      Totally agree. Lots of people are looking for open source models and it can work for certain uses, but GPT3/4, Palm Bison/Unicorn and Claude are the ones that work the best for this kind of thing.

  • @vhater2006
    @vhater2006 ปีที่แล้ว

    Hello Thank your for sharing , so if i want to use langchain and HF "Just Open" a pipelines finally it get it .why not using big models from HF on you example a 40b 65b to get "better" results
    ?

    • @samwitteveenai
      @samwitteveenai  ปีที่แล้ว

      mostly because people won't have the GPUs to serve them. Also HF doesn't serve most the big models for free on their API

  • @123arskas
    @123arskas ปีที่แล้ว

    Hey Sam, awesome work. I wanted to ask you something:
    1- Suppose we've a lot of call transcripts of multiple agents
    2- I want to summarize the transcripts of a month (lets say January)
    3- The call transcripts can be from 5 to 600 in a month for a single agent
    4- I want to use GPT-3.5 models not the other GPT models.
    How would I use LangChain to deal with that much amount of data using Async Programming? I want the number of Tokens and number of Requests to OpenAI API to be below the recommended level so nothing crashes. Any place where I can learn to do this sort of task?

    • @samwitteveenai
      @samwitteveenai  ปีที่แล้ว +1

      Take a look at the summarization vids I made and especially the map_reduce stuff so that would do lots of small summaries which you can then do summaries of summaries etc.

    • @123arskas
      @123arskas ปีที่แล้ว

      @@samwitteveenai Thank you

  • @rakeshpurohit3190
    @rakeshpurohit3190 ปีที่แล้ว

    Will this be able to give insights into the given doc like writing pattern, tone, language etc?

    • @samwitteveenai
      @samwitteveenai  ปีที่แล้ว

      It will use the those from the docs and you can set those in the prompts

  • @creativeuser9086
    @creativeuser9086 ปีที่แล้ว +1

    Can you try with the falcon-40B ?

  • @HimanshuSingh-ov5gw
    @HimanshuSingh-ov5gw ปีที่แล้ว

    How much time would this e5 embedding model take to embed large files or larger no. of files like 1500 text files?

    • @samwitteveenai
      @samwitteveenai  ปีที่แล้ว

      1500 isn't that large, on a decent GPU probably looking at 10s of mins max. Probably a lot shorter depending on each file length. Of course once indexed just save them to use in the future etc.

    • @HimanshuSingh-ov5gw
      @HimanshuSingh-ov5gw ปีที่แล้ว

      @@samwitteveenai Thanks! Btw your videos are very helpful!

  • @adriangabriel3219
    @adriangabriel3219 ปีที่แล้ว

    What dataset would you use for fine-tuning?

    • @samwitteveenai
      @samwitteveenai  ปีที่แล้ว

      Depends on the task. Mostly I use internal datasets for fine tuning.

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w ปีที่แล้ว

    Have you tried the Falcon LLM model?

    • @samwitteveenai
      @samwitteveenai  ปีที่แล้ว +1

      Yes Falcon7B was the original model I wanted to make the video with but it didnt work well.

  • @alexdantart
    @alexdantart ปีที่แล้ว

    Please, tell me your Collab environment... even in Collab Pro I get:
    OutOfMemoryError: CUDA out of memory. Tried to allocate 288.00 MiB (GPU 0; 15.77 GiB total capacity; 14.08 GiB
    already allocated; 100.12 MiB free; 14.41 GiB reserved in total by PyTorch) If reserved memory is >> allocated
    memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and
    PYTORCH_CUDA_ALLOC_CONF

    • @samwitteveenai
      @samwitteveenai  ปีที่แล้ว

      I usually use an A100. You will need the Colab Pro+ to run on Colab

  • @pubgkiller2903
    @pubgkiller2903 ปีที่แล้ว +2

    Thanks Sam …. It’s great. Would you please implement the same concept with Falcon ?

    • @samwitteveenai
      @samwitteveenai  ปีที่แล้ว +2

      I did try to do the video with Falcon7B but the outputs weren't that good at all.

    • @pubgkiller2903
      @pubgkiller2903 ปีที่แล้ว

      @@samwitteveenai one question, are these big models like Falcon, Stable Vicuña etc can work on windows laptop on Jupyter Notebook? Or they require Unix system only?

    • @fv4466
      @fv4466 ปีที่แล้ว

      @@samwitteveenai Wow! I thought it was highly praised.

  • @andrijanmoldovan
    @andrijanmoldovan ปีที่แล้ว +1

    Would this work with "TheBloke/guanaco-33B-GPTQ" 4-bit GPTQ model for GPU inference(or other GPTQ model)?

    • @samwitteveenai
      @samwitteveenai  ปีที่แล้ว +1

      possibly but would need different loading code etc.