Great work you are doing here mate. Love the structure of your videos and colab examples. I stopped looking in this area due to explosion of tools which are repetitive. And use only your channel to find out whats new nowadays. Keep up the awesome videos.!
Hey Sam, I changed the example: examples=[ ( "My first fav meal was at a restaurant called Burnt Ends in Singapore where there top dish italian cuisine was lasagne.", {"name": "Burnt Ends", "location": "Singapore","style":"cuisine","top_dish":"lasagne"}, ) ], and results were much better Cheers, Andy
Great video as usual. Thanks for your hard work. You should do one about Microsoft Guidance! I find the template-driven format pretty natural and ideal.
Thanks Sam. I actually used it for extracting items that would even require some math reasoning (e.g., total cost would require some math operations to be performed, based on the numbers in the text) which is then left to the LLM's accuracy. It got the text objects right most of the time but didn't do all that well on numbers. Any suggestions how this could be implemented?
it's just great! one idea, let's assume the doc to be eactracted is a human made one, so one validator could be to create an agent, asking for help to the doc maker (human) based on the dataframe extracted.
Amazing. Why can't we use open source models with small size to test and improve the responses for these tasks? That would be real value instead plugging everything to GPT3 or 4
hi thank you so much for the wonderful video. Is if fine to use confidential information of a company for information extraction using Langchain?? I mean does langchain itself doesn't have privacy concerns for that usage?
@samwitteveenai Because it uses Langchain, can KOR work with local models like Llama/Ollama? (This might be a stupid ques, but this is not clear to me in the video).
Hi Sam, thanks for this video! Do you know maybe how to use vectorestores with kor? Kor generates long prompts and when I add a text, it usually exceeds the token limit on OpenAI. When using pure LangChain, I can easly use text splitter and vectorstore to grab the relevant chunks of text, but I find it difficult to replicate it with kor. Any idea how to go around it? Thank you! Franek
Thanks for the awesome videos! 👏 What's interesting about this one is that it seems to work well in my limited testing, but the author himself claims the implementation is "half-baked" and prone to error. They recommend people try alternative libraries like Promptify and MiniChain to achieve the same output - could you do a video on either/both of those?
Yeah I should make a benchmark comparing to the alternatives. I think the author of Kor is very honest and I think many of the issues are to do with the qualities of the LLM rather than that package.
Can you please tell me instead of giving text data is there any other way so that i can give embedding vectors as input to the llm with this approach that you discussed in this video
@@samwitteveenai I can't give all of the raw text to the open ai because the text is so long like more than 200k letters are there so I need to convert that text into chunks and do the embedding
Incredible. Question of 1 million dollars 😊: How to "teach" chatgpt just 1 time what the schema is and be able to validate infinite texts without having to spend a token inputting the schema at the prompt and without having to train the model via fine-tune?
Hey Sam, chain.predict_and_parse has deprecated, please change to : output2 = chain.run(text=("Alice Doe moved from New York to Boston, MA while Bob Smith did the opposite"))["data"] printOutput(output2) def printOutput(output): print(json.dumps(output,sort_keys=True, indent=3)) Regards, Andy
Hey Sam! Great video! Im looking to do ner using an LLM model using transformers library. Do you know how to create my llm in my code, without it being openai's LLM? from kor import create_extraction_chain, Object, Text schema = Object( id="person", description=( "Personal info about a person" ), attributes=[ Text( id="first name", description="The first name of a person", examples=[], # many=True, ), ], examples=[ ("Alice and Bob are friends", [{'first_name': 'Alice'}, {'first_name': 'Bob'}]) ] ) llm = "?" chain = create_extraction_chain(llm, schema) text = "My name is Bobby and my Sister is Rachel. My brother is Joe." output = chain.invoke(("My name is Bobby. My brother's name Joe.")) print(output)
I have a pdf with a table data. What is the best way to extract that and store it as vectors for proper retrieval? The standard textsplitter is not accurate since it is storing it as one continuous text. Cheers!
can you please tell me or make a new video about making a tool or a transoformer agent that can take an audio and dubb it to another language with whisper or Nllb-200 and make a talking avatar to say it with sadtalker for free . thank you very much .
Great work you are doing here mate. Love the structure of your videos and colab examples.
I stopped looking in this area due to explosion of tools which are repetitive. And use only your channel to find out whats new nowadays. Keep up the awesome videos.!
You’re a godsend. You’ve really helped me understand and utilize the power of these approaches and the packages. Appreciate it!
Hey Sam,
I changed the example: examples=[
(
"My first fav meal was at a restaurant called Burnt Ends in Singapore where there top dish italian cuisine was lasagne.",
{"name": "Burnt Ends", "location": "Singapore","style":"cuisine","top_dish":"lasagne"},
)
],
and results were much better
Cheers,
Andy
Awesome! Thank you for taking the time to pass on some of your wisdom and knowledge.
Keep ‘em coming 🎉
Sam I subscribe to a ton of AI LLM channels you are a top notch resource thank you. I just need to try this out on the weekends 😆
Thank you Sam
Great video as usual, thanks sam for the great work!
Great video as usual. Thanks for your hard work. You should do one about Microsoft Guidance! I find the template-driven format pretty natural and ideal.
yeah I want to do one about that and guardrails etc.
Thanks Sam.
I actually used it for extracting items that would even require some math reasoning (e.g., total cost would require some math operations to be performed, based on the numbers in the text) which is then left to the LLM's accuracy. It got the text objects right most of the time but didn't do all that well on numbers. Any suggestions how this could be implemented?
I am trying to do the same. I can't seem to understand what I need to search for.
Is there any other open source models that can extract information apart from openai?
it's just great! one idea, let's assume the doc to be eactracted is a human made one, so one validator could be to create an agent, asking for help to the doc maker (human) based on the dataframe extracted.
Amazing. Why can't we use open source models with small size to test and improve the responses for these tasks? That would be real value instead plugging everything to GPT3 or 4
does output we get using kor depend on the operating system we are using?
Great Work! Thank You!
Have a look on Guanaco which was trained following new QLoRA approach. Might be interesting for you and your audience
hi thank you so much for the wonderful video. Is if fine to use confidential information of a company for information extraction using Langchain?? I mean does langchain itself doesn't have privacy concerns for that usage?
No it is just software that runs on your setup, its where the tools and LLMs are hosted that cause the privacy issues etc
@samwitteveenai Because it uses Langchain, can KOR work with local models like Llama/Ollama? (This might be a stupid ques, but this is not clear to me in the video).
there are better ways to do this now and good open models. Do you have have a specific use case?
Hi Sam, thanks for this video! Do you know maybe how to use vectorestores with kor? Kor generates long prompts and when I add a text, it usually exceeds the token limit on OpenAI.
When using pure LangChain, I can easly use text splitter and vectorstore to grab the relevant chunks of text, but I find it difficult to replicate it with kor. Any idea how to go around it? Thank you! Franek
Thanks a lot
How is your real-life example not nested, how possible is it.
Can we use the extracted text as an input to the llm?
and what about not using openai and use any nice pretrained model in your language ?
Thanks for the awesome videos! 👏
What's interesting about this one is that it seems to work well in my limited testing, but the author himself claims the implementation is "half-baked" and prone to error. They recommend people try alternative libraries like Promptify and MiniChain to achieve the same output - could you do a video on either/both of those?
Yeah I should make a benchmark comparing to the alternatives. I think the author of Kor is very honest and I think many of the issues are to do with the qualities of the LLM rather than that package.
@@samwitteveenai That would be awesome!
Can you please tell me instead of giving text data is there any other way so that i can give embedding vectors as input to the llm with this approach that you discussed in this video
I am not sure why you would want to do that, can you explain.
@@samwitteveenai I can't give all of the raw text to the open ai because the text is so long like more than 200k letters are there so I need to convert that text into chunks and do the embedding
Wow, that’s really cool. Is Kor the only game in town for doing this currently?
No there are a couple of other ways as well, So I might make a vid about them at some point as well
Great work, How long can be the sentence? The same number of token than ChatGpt admite?
It's not limited to a sentence really it can be anything you can 'stuff' into one pass of the LLM, I generally do a few paragraphs at a time.
14:23 probably pd.json_normalize(json_data) would work out of the box here
Thank you!
I cannot get past the StringIO error in Kor library. can anyone help me with this?
Do you know if its possible to feed in multiple text chunks into the pipeline like you can do with the langchain QA Chain?
Yeah that should doable. It will really operate on any input.
Incredible. Question of 1 million dollars 😊: How to "teach" chatgpt just 1 time what the schema is and be able to validate infinite texts without having to spend a token inputting the schema at the prompt and without having to train the model via fine-tune?
It is all put in via ICL (In context Learning)
@@samwitteveenai Thanks for the tip. Do you have any indication of material so I can do this in a nocode way?
Hi Sam good video 👍 can you make a video on how to run Private GTP on your local machine and colab 🙂👍
Hey Sam, chain.predict_and_parse has deprecated, please change to : output2 = chain.run(text=("Alice Doe moved from New York to Boston, MA while Bob Smith did the opposite"))["data"]
printOutput(output2)
def printOutput(output):
print(json.dumps(output,sort_keys=True, indent=3))
Regards,
Andy
Hi, is it working with LinkedIn?
What do you want to do with Linkedin?
guys, is anybody facing await issue..how to solve this one
Hii have you solved that issue because I am also facing same problem? Please reply
Hey Sam! Great video!
Im looking to do ner using an LLM model using transformers library. Do you know how to create my llm in my code, without it being openai's LLM?
from kor import create_extraction_chain, Object, Text
schema = Object(
id="person",
description=(
"Personal info about a person"
),
attributes=[
Text(
id="first name",
description="The first name of a person",
examples=[],
# many=True,
),
],
examples=[
("Alice and Bob are friends", [{'first_name': 'Alice'}, {'first_name': 'Bob'}])
]
)
llm = "?"
chain = create_extraction_chain(llm, schema)
text = "My name is Bobby and my Sister is Rachel. My brother is Joe."
output = chain.invoke(("My name is Bobby. My brother's name Joe."))
print(output)
For a basic NER model you can just use something like a fine-tuned DistilRoBERTA etc
I have a pdf with a table data. What is the best way to extract that and store it as vectors for proper retrieval? The standard textsplitter is not accurate since it is storing it as one continuous text. Cheers!
i think after release OpenAI Function Agent that Kor is useless
can you please tell me or make a new video about making a tool or a transoformer agent that can take an audio and dubb it to another language with whisper or Nllb-200 and make a talking avatar to say it with sadtalker for free . thank you very much .
I cannot get past the StringIO error in Kor library. can anyone help me with this?