Your diagrams + explanations are really helpful. I usually phase out when people explain things using diagrams, but the way you do it is very easy to follow and understand and I'm sure others feel the same as well.
Your explanation aided by that diagram up front made it extremely easy to understand what otherwise is a complex topic for newbies like myself. I am learning a lot from your videos. A big thank you for all your efforts.
@@DataIndependent Honestly i’m not sure gpt-index would be the best way to go, but my use case is that i have a large amount of document that i need to store on the cloud and update weekly in order to be accessible to a web app. Looking around online I thought I could use gpt-index as a long term memory and use langchain to connect the model. Like a way to q&a your personal journal stored online
This is cool. Because openai models are great at sentiment analysis. You could write a script that automatically fetches trending videos of a specific topic (a specific industry/market, depending on your needs) and performs sentiment analysis on the 500 highest performing videos. Just filter TH-cam by topic, uploaded: Today, sort by: views. Then do sentiment analysis on each transcript, assign a score. (You would have to do some real work on designing a scoring system, though. That's what determines the value of this whol thing) And you end up with daily summaries on what people are saying about some product, market, political figures, whatever you like. Daily summaries, along with the sentiment analysis scoring system, are turned into statistics, charts, weekly summaries, monthly.. etc. You'd need to have a good setup for circumventing the token-limit when interpreting transcripts, but that can be done.
This is great! Could you do a video about connecting lang chain to embedding / semantic search? I've been eyeing what you can do with Pinecone - but I dont know where to start.
@@DataIndependent Yeah like lets say I saved a few 200-300 page self help book pdfs to my google drive. I'd like to be able to do Q&A where it does semantic search through embeddings to find the best k results, and then it feeds those results into prompt context before sending it to the llm.
Forgive me if this is a noob question, but went to try this out myself by importing your TH-cam Loader file into my Jupyter Notebook, and I keep running into "AttributeError: type object 'TH-camLoader' has no attribute 'from_youtube_url'". Any idea on what I could be doing wrong? Cheers 🙏
Hi, this is a really helpful video. I want to ask you - when I try to access a TH-cam video which is too long, I get an empty list as the result meaning I dont have any text to split. Have you come across this before?
Hey Karin! hm, I haven't run into that. But rather than the problem being that it is too long it sounds like there isn't a transcript for it (not all videos have it). Can you see the transcript on the TH-cam UI?
I'm at a loss. There seems to be no easy way to move from an existing list of preprocessed strings into anything that can chunk those strings. All the loaders assume a user will need to load from a document of whatever description. What if that isn't the case? What if a list is ready to go?
I ran into this. If you already have a string variable of text say. "LargeText" change this line from .split_documents to texts = text_splitter.split_text(LargeText)
Hello there, Greg! I really appreciated your video! Imagine I have a playlist with a bunch of URLs. How would you handle this scenario? Initially, I would extract these URLs from the TH-cam playlist using from pytube import Playlist. Now, to obtain the transcript for each of them, I attempted the method you showcased in the video (Multiple Videos) but faced issues. Do you have any suggestions or thoughts on this?
I am interested to know what happens/what should you do when the map_reduce chaining token size is also too long. For example, what if all the concatenated summaries are greater than 4096 tokens, the max limit? Maybe, there could be a map_reduce_recursive and it will automatically solve this problem for you.
Omg nevermind! Your next video about a querying a book and pinecone covers when you have many documents. It looks like the method is to find similar documents first instead of map_reduce summarizing all of them!
Great video! Can you please check if this code still working? I think something might be changed on google side, because I can't load any video transcriptions anymore...
hey a dumb question. If i can simply call openai apis, what is the benefit of using langchain? Internally langchain is also calling openai apis. Would taking the langchiain path not increase the latency of the application?
Check out my latest video on the 7 core concepts of LangChain. In it I overview most of the power it can do today. Tons of software built to make common tasks easy. Yes it would increase the latency, but that is unavoidable for more sophisticated tasks at the moment.
Hi These videos are very helpful, I have a question though. Its seems there is some overlap between what you can do with Langchain and llama-index (gpt-index), in what scenario would you leverage both libraries?
Yep, you'll need to edit the prompt that is being used. Ex: Concise > Detailed Here is the documentation on how to do that langchain.readthedocs.io/en/latest/modules/indexes/chain_examples/summarize.html?highlight=load_summarize_chain#the-stuff-chain:~:text=Ukrainian%2DAmerican%20citizens.%22%7D-,Custom%20Prompts,-You%20can%20also
The results of your Langchain summary of summaries are 👏 a small question: say u have formula, for example the quadratic formula or some specific formula, in your document. Can I ask a question to solve the answer?
Awesome videos and focusing on solving business problems 🙂 1. ChatGPT playground follows the prompts as necessary i.e. the way it should, however, the ChatGPT API using same model, same prompts, and with settings, however, the return response in API call is not always or rarely in the requested format. 2. Can feeding in excel tabular data using the method you demonstrated or another method train ChatGPT to predict on a column? Just found your channel a few hours ago, awesome videos and thank you for making them.
@@DataIndependent Thank you for responding. Imagine you have a table with these columns, price, car-years, car-make-, car-autoglass, with thousand of rows. Can you use that table to train ChatGPT predicting a price given the car-years, car-make-, car-autoglass,
@@aiautoglasscrm ah nice I see. For that you’d want to use a different ML model. Likely a regression based on those attributes. There are a bunch out there to choose from. Maybe even some Kaggle exercises as examples
Hi Greg, great video again! already "liked". Wondering if there is a translation module from Langchain, as some youtube videos are of different language. And two more requests from youtube functions. 1st, can i just get the full transcript? and second can i place a timing to the extract like between 1 min into the video till like 4th mins? Thanks mate, sorry for pushing the limits on this as with those, there are real uses.
Hey! For translation, I haven't seen first class support for this from LangChain yet. For full transcriptions, yep you can, it is an output of that data loader which should work for you. I don't understand the question about min 1-4
@@DataIndependent Oh about the 1-4 min is when a video is like 10 mins, i just want to summarize those from the first minute to the 4th and leave out the rest. Just wondering if that can be done
@@cgtinc4868 Nice - when I got the transcript I didn't see timestamps but they may be hiding in there somewhere. You could do it when it's just a simple matter of cropping the transcript.
Very nice demo, thank you very much. I am wondering if anyone else is running into error: "AttributeError: type object 'TH-camLoader' has no attribute 'from_youtube_url'" when running: loader = TH-camLoader.from_youtube_url("th-cam.com/video/QsYGlZkevEg/w-d-xo.html", add_video_info=True) despite prerequisites being installed? Thanks again OP, well presented tutorial.
@@waqasobeidy8318 the loader seems to have been updated to use the official GoogleCloudAPI. v0.0.105 still seems to work. I'm not against using the API but anything that asks for my credit card I tend avoid at all cost
Thank you for sharing this, super helpful. Wondering if you ran into the below issue with the API unable to retrieve the transcript ? sharing sample below Could not retrieve a transcript for the video th-cam.com/video/eVX0QrvjA5M/w-d-xo.html! This is most likely caused by: Subtitles are disabled for this video Thanks in adavnce!!
Interesting, no I haven't seen that issue, though not scalable, there are a lot of sites that will get a transcript for you from the audio. Or you could use whisper
Yes, the diagrams are supercool. I was wondering how to do subsections of a video eg. th-cam.com/video/QsYGlZkevEg/w-d-xo.html -- is there a way to then get the summary of a section of a video.
@@DataIndependent Thank you. Maybe there is a different API in TH-camLoader for this? Else does one have to dig or guesstimate the spot in the text stream?
Your diagrams + explanations are really helpful. I usually phase out when people explain things using diagrams, but the way you do it is very easy to follow and understand and I'm sure others feel the same as well.
That's awesome to hear. Thank you for sharing that.
Your explanation aided by that diagram up front made it extremely easy to understand what otherwise is a complex topic for newbies like myself. I am learning a lot from your videos. A big thank you for all your efforts.
Awesome, thanks for letting me know!
Diagrams are super useful, great videos overall. Please keep them coming!
I anticipate you will become very popular soon, keep up this good work and you will reach 100’s of thousand people audience
That would be cool! I will continue to put energy into this space
I’m going to use LangChain to look at all your videos and tell me which ones I should really pay attention to based on what I’m trying to do 💥
Nice!
Amazing video! I am just discovering the Langchain +OpenAI and your videos are just superb.
Nice! Thank you
Is it possible to pass additional instructions to the summary method?
Have u found out a way? I mean, I need to summarize the video in another language instead of english
Greg, thank you for all the videos you have made, theyve all been super helpful! I hope you get everything you want back in life!
Your diagrams are cool. Your Explanations are cool and the content is kick-ass. Do more videos brother.
One of the easiest tutorial to follow, thx!
Is there a way to save the youtube link as a variable to put in streamlit?
Great tutorial!
great video and nicely explained !!
love your videos
So i have questions can we also extract category of vedio by TH-cam transcript??
Your videos are amazing! Looking forward to see how you could integrate gpt-index as well as langchain!
Thanks thanks for the comment. What is your use case for the two tools? I like to have an example to work through instead of just an overview
@@DataIndependent Honestly i’m not sure gpt-index would be the best way to go, but my use case is that i have a large amount of document that i need to store on the cloud and update weekly in order to be accessible to a web app. Looking around online I thought I could use gpt-index as a long term memory and use langchain to connect the model. Like a way to q&a your personal journal stored online
MFM great podcast! If I wasn’t subscribed I am now!
I am going to start using "instantialize" unironically. Good word
Tomato tomato ha :) Like any good forward thinking developer I just snagged instantialize.com
Extremely useful! Thanks a lot!
Glad it was helpful! Anything else you want to see?
This is cool. Because openai models are great at sentiment analysis. You could write a script that automatically fetches trending videos of a specific topic (a specific industry/market, depending on your needs) and performs sentiment analysis on the 500 highest performing videos. Just filter TH-cam by topic, uploaded: Today, sort by: views.
Then do sentiment analysis on each transcript, assign a score. (You would have to do some real work on designing a scoring system, though. That's what determines the value of this whol thing)
And you end up with daily summaries on what people are saying about some product, market, political figures, whatever you like.
Daily summaries, along with the sentiment analysis scoring system, are turned into statistics, charts, weekly summaries, monthly.. etc.
You'd need to have a good setup for circumventing the token-limit when interpreting transcripts, but that can be done.
Yeah I like that idea. No need to stop at youtube videos either. There is likely a lot of good data on reddit/twitter as well.
@@DataIndependent Yes. For twitter, you could probably target the main news/influencers for a niche
This is great! Could you do a video about connecting lang chain to embedding / semantic search? I've been eyeing what you can do with Pinecone - but I dont know where to start.
Ya sounds great. Could you give me an example problem statement or exercise you'd like to walk through? Ex: "I want to search XYZ"
I second this request!
@@DataIndependent Yeah like lets say I saved a few 200-300 page self help book pdfs to my google drive. I'd like to be able to do Q&A where it does semantic search through embeddings to find the best k results, and then it feeds those results into prompt context before sending it to the llm.
Nice thank you. That’ll be a fun example to do. I’ll give it a go tomorrow
Forgive me if this is a noob question, but went to try this out myself by importing your TH-cam Loader file into my Jupyter Notebook, and I keep running into "AttributeError: type object 'TH-camLoader' has no attribute 'from_youtube_url'".
Any idea on what I could be doing wrong?
Cheers 🙏
Is there a way to get longer and more detailed output?
You can change the prompt or use custom prompts and ask for more information
Hi, this is a really helpful video. I want to ask you - when I try to access a TH-cam video which is too long, I get an empty list as the result meaning I dont have any text to split. Have you come across this before?
Hey Karin!
hm, I haven't run into that. But rather than the problem being that it is too long it sounds like there isn't a transcript for it (not all videos have it).
Can you see the transcript on the TH-cam UI?
Great!, How to upload my personal link with audio? Which is the method?
What do you mean your personal link?
@@DataIndependent I have a link when i'm teaching but it's not from youtube, is it possible to put on this youtubeLoader....(url)?
I'm at a loss. There seems to be no easy way to move from an existing list of preprocessed strings into anything that can chunk those strings. All the loaders assume a user will need to load from a document of whatever description. What if that isn't the case? What if a list is ready to go?
Sorry I don't understand fully. Where is your list of strings? in a text doc?
I ran into this. If you already have a string variable of text say. "LargeText" change this line from .split_documents to texts = text_splitter.split_text(LargeText)
It would appear that something has changed. I'm trying to use the TH-camLoader module but i get an SSL error: `urllib.error.URLError: `
Hello there, Greg! I really appreciated your video! Imagine I have a playlist with a bunch of URLs. How would you handle this scenario? Initially, I would extract these URLs from the TH-cam playlist using from pytube import Playlist. Now, to obtain the transcript for each of them, I attempted the method you showcased in the video (Multiple Videos) but faced issues. Do you have any suggestions or thoughts on this?
I am interested to know what happens/what should you do when the map_reduce chaining token size is also too long. For example, what if all the concatenated summaries are greater than 4096 tokens, the max limit? Maybe, there could be a map_reduce_recursive and it will automatically solve this problem for you.
Omg nevermind! Your next video about a querying a book and pinecone covers when you have many documents. It looks like the method is to find similar documents first instead of map_reduce summarizing all of them!
Nice! Glad that worked out
Great videos !. And Greetings from Pedro Pascal ancestors land ! Chile 🇨🇱
Thank you! Greetings!
Great video! Can you please check if this code still working? I think something might be changed on google side, because I can't load any video transcriptions anymore...
hey a dumb question.
If i can simply call openai apis, what is the benefit of using langchain? Internally langchain is also calling openai apis.
Would taking the langchiain path not increase the latency of the application?
Check out my latest video on the 7 core concepts of LangChain. In it I overview most of the power it can do today. Tons of software built to make common tasks easy.
Yes it would increase the latency, but that is unavoidable for more sophisticated tasks at the moment.
Good video! I am curious, is there any way we can train our ai so it can answer as a professional way like chatgpt does?
You'll need to do a custom prompt and tell it to speak in a different tone, examples of the tone you're looking for are good as well
great,great
Hi These videos are very helpful, I have a question though. Its seems there is some overlap between what you can do with Langchain and llama-index (gpt-index), in what scenario would you leverage both libraries?
I'm getting this question a lot and happy to do a video on it. Thanks for asking.
Cool!
Is there a way to change the summary prompt? Like detailed summary instead of concise?
Yep, you'll need to edit the prompt that is being used. Ex: Concise > Detailed
Here is the documentation on how to do that
langchain.readthedocs.io/en/latest/modules/indexes/chain_examples/summarize.html?highlight=load_summarize_chain#the-stuff-chain:~:text=Ukrainian%2DAmerican%20citizens.%22%7D-,Custom%20Prompts,-You%20can%20also
nice tutorial, is it possible to use another language for the transcript and also modify the prompt?
loader = TH-camLoader.from_youtube_url("th-cam.com/video/QujoO8CLGMw/w-d-xo.html", add_video_info=True,language='de')
chain = load_summarize_chain(llm, chain_type="map_reduce", verbose=True,map_prompt=prompt,combine_prompt=prompt)
Yep that is exactly it to modify the prompt. How did it go for you?
Excellent videos. I loved this but when I try running it, I am receiving multiple errors. Does anyone have a fully working code?
I'm having the same issue. I've heard that google has shutdown some loaders, such as pytube... :(
That's extremely cool ! Wondering if you can compare the performance on using Flan T5 vs GPT 's performance on Langchain pipeline next time ❤💪
Great suggestion! I'll add this to the list
The results of your Langchain summary of summaries are 👏
a small question: say u have formula, for example the quadratic formula or some specific formula, in your document. Can I ask a question to solve the answer?
You would likely need to isolate that piece of information and then ask it to solve it. There are math tools but I haven't used them a ton.
Awesome videos and focusing on solving business problems 🙂
1. ChatGPT playground follows the prompts as necessary i.e. the way it should, however, the ChatGPT API using same model, same prompts, and with settings, however, the return response in API call is not always or rarely in the requested format.
2. Can feeding in excel tabular data using the method you demonstrated or another method train ChatGPT to predict on a column?
Just found your channel a few hours ago, awesome videos and thank you for making them.
Thanks for the kind words. What do you mean to predict on a column?
@@DataIndependent Thank you for responding. Imagine you have a table with these columns, price, car-years, car-make-, car-autoglass, with thousand of rows. Can you use that table to train ChatGPT predicting a price given the car-years, car-make-, car-autoglass,
@@aiautoglasscrm ah nice I see. For that you’d want to use a different ML model. Likely a regression based on those attributes.
There are a bunch out there to choose from. Maybe even some Kaggle exercises as examples
@@DataIndependent Thank you!!
Hi Greg, great video again! already "liked". Wondering if there is a translation module from Langchain, as some youtube videos are of different language. And two more requests from youtube functions. 1st, can i just get the full transcript? and second can i place a timing to the extract like between 1 min into the video till like 4th mins? Thanks mate, sorry for pushing the limits on this as with those, there are real uses.
Hey! For translation, I haven't seen first class support for this from LangChain yet.
For full transcriptions, yep you can, it is an output of that data loader which should work for you.
I don't understand the question about min 1-4
@@DataIndependent Oh about the 1-4 min is when a video is like 10 mins, i just want to summarize those from the first minute to the 4th and leave out the rest. Just wondering if that can be done
@@cgtinc4868 Nice - when I got the transcript I didn't see timestamps but they may be hiding in there somewhere. You could do it when it's just a simple matter of cropping the transcript.
Very nice demo, thank you very much.
I am wondering if anyone else is running into error: "AttributeError: type object 'TH-camLoader' has no attribute 'from_youtube_url'" when running: loader = TH-camLoader.from_youtube_url("th-cam.com/video/QsYGlZkevEg/w-d-xo.html", add_video_info=True) despite prerequisites being installed?
Thanks again OP, well presented tutorial.
Nice! I've seen some updates come through for langchain and specifically that loader. Maker sure you're on the most recent version
Were you able to solve this? I installed the latest version but still faces the same issue.
@@waqasobeidy8318 the loader seems to have been updated to use the official GoogleCloudAPI. v0.0.105 still seems to work. I'm not against using the API but anything that asks for my credit card I tend avoid at all cost
@@d279020 Yep I agree. The function works fine on the older version like you suggested, Thanks.
@waqas which older version you using?
You look like Ryan Gosling
Thank you for sharing this, super helpful. Wondering if you ran into the below issue with the API unable to retrieve the transcript ? sharing sample below
Could not retrieve a transcript for the video th-cam.com/video/eVX0QrvjA5M/w-d-xo.html! This is most likely caused by:
Subtitles are disabled for this video
Thanks in adavnce!!
Interesting, no I haven't seen that issue, though not scalable, there are a lot of sites that will get a transcript for you from the audio. Or you could use whisper
Yes, the diagrams are supercool.
I was wondering how to do subsections of a video eg. th-cam.com/video/QsYGlZkevEg/w-d-xo.html -- is there a way to then get the summary of a section of a video.
You totally could. You just need to feed that subsection into your summarizer.
There isn’t an easy out of the box way to do it though
@@DataIndependent Thank you. Maybe there is a different API in TH-camLoader for this? Else does one have to dig or guesstimate the spot in the text stream?
And they saying “youtube loader has no attribute from_youtube_url
Try upgrading LangChain and if it still doesn't work check the code on the documentation