Microsoft's Visual ChatGPT using LangChain

แชร์
ฝัง
  • เผยแพร่เมื่อ 2 ต.ค. 2024

ความคิดเห็น • 21

  • @ItsRyanStudios
    @ItsRyanStudios ปีที่แล้ว +3

    This is awesome
    GPT4 multimodal capabilities without GPT4👍
    Also great to see so many open source models being put to use. Shows how important open source is in enabling independent groups to buld more complex systems.

  • @stalinamirtharaj1353
    @stalinamirtharaj1353 ปีที่แล้ว

    How this is different from Dall.E model offered by OpenAI? using OpenAI would be very straight forward.isn't it?

  • @venkatesanr9455
    @venkatesanr9455 ปีที่แล้ว +2

    Thanks for the video. But I believe that it is combining all features like stable diffusion, langchain & llms together but there is no handling of ocr related doc like images or pdf doc....this is my thought process..If I am wrong you can correct me ... Thanks

    • @samwitteveenai
      @samwitteveenai  ปีที่แล้ว

      Yes, you are right this doesn't do OCR out of the box, but you could add a call to something like Google OCR or TrOCR (huggingface.co/docs/transformers/model_doc/trocr) in a similar way to what is done in the Visual ChatGPT. I wouldn't suggest to use it like this for documents though. In a case like that better to just OCR first and then use a vector store and semantic search etc.

  • @johntanchongmin
    @johntanchongmin ปีที่แล้ว

    Very detailed runthrough, Seems like Langchain is the brain of the decision-making, since the Visual ChatGPT code does not explicitly call those tools.

  • @bandui4021
    @bandui4021 ปีที่แล้ว +1

    Langchain decides wich tool to use. But how does it decide which tool to take? Thank you for explaining these concepts in an easy-to-understand way!

    • @samwitteveenai
      @samwitteveenai  ปีที่แล้ว +3

      It does this with a prompt strategy called the ReACT. A number of people have asked about this so I will make a video on this over the next week or so

  • @tarasankarbanerjee
    @tarasankarbanerjee 11 หลายเดือนก่อน

    It's awesome!! Thanks for making such great videos.. 👍👍

  • @lloydsloan1349
    @lloydsloan1349 ปีที่แล้ว +1

    I love this tool! I'm trying to integrate it with some of my projects. I also have been interested in how to implement longer memory in ChatGPT in my own coding projects. I noticed around 11:05 the ConversationBufferMemory. Can you tell me more about this function?

    • @samwitteveenai
      @samwitteveenai  ปีที่แล้ว +3

      Hi I have a whole video on types of memory and how to use them here th-cam.com/video/X550Zbz_ROE/w-d-xo.html hope this helps.

  • @viorelteodorescu
    @viorelteodorescu ปีที่แล้ว

    Very good and very interesting. Keep it up!

  • @nark4837
    @nark4837 ปีที่แล้ว

    Would you expect this is how GPT-4 actually does it, or not? Because if so, I would still say we are very far off from AGI, since it is just a bunch of interconnected (but separate) models.

    • @samwitteveenai
      @samwitteveenai  ปีที่แล้ว

      No they are most likely tokenizing the image and passing that in with a special token that represents an image, similar to what the recent PaLM-E model does from Google. The said we are still a long long way of full AGI.

    • @compteprivefr
      @compteprivefr ปีที่แล้ว +2

      That's what the brain is. A bunch of separate "models". Visual processing in the brain is completely different from Language processing. So I'd say we are exactly on the right track

    • @nark4837
      @nark4837 ปีที่แล้ว +1

      @@compteprivefr I sort of see your point, but I believe there was some research done in which different sensory organs were connected to different cortices in the brain, and the animal quickly learned how to hear through their visual cortex and such. Implying the algorithms are essentially the same, which obviously in our case, sequential modelling vs CNNs involve completely different algorithms.

  • @kevinehsani3358
    @kevinehsani3358 ปีที่แล้ว

    You made a great comment here that has me confused for a while, "The language model decides whether it needs a tool or not". I guess for now we may need agents but won't agents be obsolete once language models are trained further and do not need to be told what tools they need.

    • @samwitteveenai
      @samwitteveenai  ปีที่แล้ว +1

      I think Tools are always good as they can get up to date info etc where the models can't be totally up to date also model weights are not good for storing facts, this is one reason we see a lot of hallucinations.

    • @kevinehsani3358
      @kevinehsani3358 ปีที่แล้ว

      @@samwitteveenai weights are created using facts or data. If we guide the model in steps hallucination decreases. I understand you can't save every combination after all the "go" game has 10 to the power of 167 combinations alone. Of course one needs tools. But my thoughts are that the models can be fine tuned as new tools come along and at some point do we really need agents in the long run, as I understand it, an agent is the declaration of tools to use. Of course tools are necessary specially for mathematical calculations and structured data, etc.

  • @akshara8812
    @akshara8812 ปีที่แล้ว

    Hi Can you please help me how to create public link

    • @samwitteveenai
      @samwitteveenai  ปีที่แล้ว

      Do you mean serving this in the cloud?

    • @akshara8812
      @akshara8812 ปีที่แล้ว

      @@samwitteveenai Thank you for you reply , i am able to create public link by providing share=True as argument for launch method in visual_chatgpt.py file