Your videos are of high quality and cover quite a range of topics. But i wonder why the subscribers are so few relatively. My personal take is that you lay a very good foundation- easy to understand, then dive right into coding which is very practical. i feel there's something missing in-between.
hi thanks for such an informative video , what about the scenerio if we extract numeric features from our datasets like sentiments etc then how can we input them for transformer specially T5, Albert without doing masking
So we give question as input from prompt then our model picks up a random context from our dataset and gives random answer...(if we didn't fine tune the model)
What is the for a custom dataset, the question for a context has answers coming from multiple section of the paragraph? I believe for the dataset here you only have one answer per question from a context but how to handle multiple start index for a question?
Thanks for making this video. Learnt a lot. Follow up question: Can the question and answering more of a chat format where you can build questions and follow ups? Let’s say I am embedding the text, create vector of the text. When a question is asked, it’s converted to vector and then using cosine similarity, fetch the response. Can it be done with any of the models this way? Could you please make a video or share feedback if possible? Thanks.
What do you mean build questions? Maybe like form a digital identity based on the questions you ask? You can definitely do that. In fact, the new buzz: Chat GPT 3 can do just that.
@@SpencerPaoHere Apologies for lack of clarity from my side. Yeah, when I meant follow up questions: I meant very similar to how a chatbot work taking all the context and answer like a conversation rather than asking a question , it returns a response, starting over again. Trying to understand how to make it conversational - open source way (cause openAI one’s are costs :(
@@sriramkrishna6853 I see! I'd recommend this site for more info: www.geeksforgeeks.org/chatbots-using-python-and-rasa/ I think what you are asking is Conversational AI. (Natural Language Understanding) This is an entire sub-industry. But there are many resources out there! I definitley recommend diving deeper in chat gpt because that definitely answers your question.
HI ! Thanks about the data format I read the link and it mainly explain that the data have to be in form of json or list or dictionnaries does it mean that if I have a pandas dataframe with column question, answer, answer_start and answer_end it won't work ?
If you are using the pandas library, there should be a read_json(). So, you should be fine! And from what you just described, if your features are structured then you should be okay.
I unfortunately do not; however, you can think of MCQ as just multiple entries - dictionary where key: (list record of strings)- then you provide the answers in the QA model tuning. I think that’s what you mean ? Otherwise there are a few articles that are present - happy to share
You can reduce the dataset size by: 1. Sampling: Use random or stratified sampling to pick a smaller, representative subset. 2. Data cleaning: Remove duplicate or irrelevant data to streamline your dataset. 3. Preprocessing: Focus on questions and contexts with higher relevance to the task. Let me know if you’d like me to explain any of these in detail!
What's the use of the model in question answering system, if the dataset contains answer column already? Simple search will also work for SQUAD then there's no need to finetune a model for that. Correct me if I'm wrong about squad dataset
At a high level, the QA model is basically a search function, attempting to find the relationships between the given question and a given answer. Now, in practice, you are going to have many "questions". And, a QA model uses the weightage from its training sets to see which is a good answer for your given question. The beauty is that you do not need to already have a predefined answer. The QA model learns from previous Question-answer pairs and you can ask new questions (not previously defined) and perhaps get a good answer.
@@loading757 have you consiedered upgrading to get more storage? An alternative would also be to access an external storage (another cloud or personal computer) and do the computations via chunking.
It's on my github: github.com/SpencerPao/Natural-Language-Processing/blob/main/Question%20Answering%20Modeling/Question_Answering_Modeling_colab.ipynb Try clicking on the "Open in Colab" button.
Thanks for this video is very simple and great
Your videos are of high quality and cover quite a range of topics. But i wonder why the subscribers are so few relatively. My personal take is that you lay a very good foundation- easy to understand, then dive right into coding which is very practical. i feel there's something missing in-between.
hi thanks for such an informative video , what about the scenerio if we extract numeric features from our datasets like sentiments etc then how can we input them for transformer specially T5, Albert without doing masking
So we give question as input from prompt then our model picks up a random context from our dataset and gives random answer...(if we didn't fine tune the model)
so how these answers can be graded ? can u please tell me how we grade them out of 10
Please I want the link of this dataset on kaggle
What is the for a custom dataset, the question for a context has answers coming from multiple section of the paragraph? I believe for the dataset here you only have one answer per question from a context but how to handle multiple start index for a question?
That would be a different technique if I am understanding correctly -- Multiple Choice Question Answering is a hot topic!
Thanks for making this video. Learnt a lot.
Follow up question: Can the question and answering more of a chat format where you can build questions and follow ups?
Let’s say I am embedding the text, create vector of the text. When a question is asked, it’s converted to vector and then using cosine similarity, fetch the response. Can it be done with any of the models this way? Could you please make a video or share feedback if possible? Thanks.
What do you mean build questions? Maybe like form a digital identity based on the questions you ask? You can definitely do that. In fact, the new buzz: Chat GPT 3 can do just that.
@@SpencerPaoHere Apologies for lack of clarity from my side.
Yeah, when I meant follow up questions: I meant very similar to how a chatbot work taking all the context and answer like a conversation rather than asking a question , it returns a response, starting over again.
Trying to understand how to make it conversational - open source way (cause openAI one’s are costs :(
@@sriramkrishna6853 I see! I'd recommend this site for more info: www.geeksforgeeks.org/chatbots-using-python-and-rasa/
I think what you are asking is Conversational AI. (Natural Language Understanding) This is an entire sub-industry. But there are many resources out there!
I definitley recommend diving deeper in chat gpt because that definitely answers your question.
HI ! Thanks about the data format I read the link and it mainly explain that the data have to be in form of json or list or dictionnaries does it mean that if I have a pandas dataframe with column question, answer, answer_start and answer_end it won't work ?
If you are using the pandas library, there should be a read_json(). So, you should be fine!
And from what you just described, if your features are structured then you should be okay.
Hi, Do you have any video on how to do perform MCQ( one question with 4 answers) or please provide any good link to perform MCQ tack...please?
I unfortunately do not; however, you can think of MCQ as just multiple entries - dictionary where key: (list record of strings)- then you provide the answers in the QA model tuning. I think that’s what you mean ? Otherwise there are a few articles that are present - happy to share
how can i reduce the dataset size to make the training time shorter
You can reduce the dataset size by:
1. Sampling: Use random or stratified sampling to pick a smaller, representative subset.
2. Data cleaning: Remove duplicate or irrelevant data to streamline your dataset.
3. Preprocessing: Focus on questions and contexts with higher relevance to the task.
Let me know if you’d like me to explain any of these in detail!
Stuck on this kind of project tbh I'm dying
I'm confused, have you not just fine-tuned a squad model with squad data?
Hmm. What do you mean? The dataset in use can be "replaced" with your dataset of choice.
@@SpencerPaoHeresir do you know how we can convert custom Question-Answer dataset to this format? since my dataset only has two columns
What's the use of the model in question answering system, if the dataset contains answer column already? Simple search will also work for SQUAD then there's no need to finetune a model for that. Correct me if I'm wrong about squad dataset
At a high level, the QA model is basically a search function, attempting to find the relationships between the given question and a given answer. Now, in practice, you are going to have many "questions". And, a QA model uses the weightage from its training sets to see which is a good answer for your given question. The beauty is that you do not need to already have a predefined answer. The QA model learns from previous Question-answer pairs and you can ask new questions (not previously defined) and perhaps get a good answer.
sir, i have getting error on tuning . please help. should i have to change runtype to gpu in colab?
It really depends on what the error is ! What’s the error that appears ?
@@SpencerPaoHere also disk storage is full.. What to do next?please help sir
@@loading757 have you consiedered upgrading to get more storage? An alternative would also be to access an external storage (another cloud or personal computer) and do the computations via chunking.
@@SpencerPaoHere if i consider reducing dataset size, how to do it?
@@loading757 you can sample from your dataset! (for example: randomly select N observations from your dataset)
can I get Google's Notebook link for this ?
It's on my github: github.com/SpencerPao/Natural-Language-Processing/blob/main/Question%20Answering%20Modeling/Question_Answering_Modeling_colab.ipynb
Try clicking on the "Open in Colab" button.
No module named 'keras.saving.hdf5_format' how to solve it? help help!