Excellent tutorial. Could you please give a tutorial how to feed a document and extract the answers from it and then deployment it. Thank you in advance❤
A very nice video. A small suggestion - Maybe after the video, you should also show how to terminate all the 3 things - notebook, model and endpoint, so that people don't incur a lot of cost. Keep up the good work!
if i do all of the things that shows in the video so it will cost me for 24 hours? if yes then how can i save my cost by only triggering it when sending a request and then terminate it?
@@sravantipris3544 If you use initial free credits then it would not cost you any money. However, make sure to disable all the services immediately else it could go to USD 600+
Great Video , i need some information for the cloud deployment of the llm model if deploy a llm model on aws sagemaker and used it using api gateway and lambda then how the amazon will charge me like whether the charge is for 24/7 or for the api hit only it will be very helpful if anyone can share their insights
what is the permissions we need to add in the IAM role, while we getting this "ACCESS DENIED EXCEPTION WHEN CALLING INVOKE ENDPOINT" in lambda function.
in the video I have 2 doubts. 1) 48:48 you have created some IAM > Policy like AWSlambdaBasicExecutionRole-30e....... and AmzonSagemaker-ExecutionPolicy... how did you did that!! 2) 46:40 what is that "path" : "\example" can you please explain that!!
1. You can create policies in the IAM. Search IAM in search box. Open IAM, look for policies in the left hand side and go inside it add policies. 2. Path is typically related to the URL path of the incoming HTTP request, specifically when working with API gateway. Mainly, you will configure API gateway after Lambda function. That's why but you can ignore that. You can just define queryStringParameters. param1=query or something depending upon how you write your lambda code.
Absolutely , you can do that. That will be a bit of manual deployment. Where you can push the model weights to S3 and using script you can deploy via Sagemaker. The good way is to deploy through DLC. I mean images.
Error show when choosing 70b how do fix it this error showing jumpstart-dft-meta-textgenerationneuron-llama-2-70b-f Something went wrong We encountered an error while preparing to deploy your endpoint. You can get more details below. operation deployAsync failed: handler error
How are you querying the model in Jupyter Labs prior to you ever deploying the model? I am confused by that. Amazing video, just want some clarification if possible. In addition, why is the instance type configured in the deployment code different than the T5.2xlarge you configured in SageMaker?
He first configured the notebook instance that was used to run the Jupyter Notebook code, etc. then later in the video he is configuring the predictor (i.e. inference endpoint) that will hold the model and can be called from AWS Lambda
Hi, Thanks for the valuable video. My doubts are 1. How you handled error on lambda related to IAM policy whether it is specifically for accessing sagemaker endpoints here... 2. For getting api response, whether we do not need any flask or fastapi implementation. Can you guide on this. waiting for ur responses and videos....
Hi Venkatesan, you have to attach the policies in IAM policy for lambda, S3, sagemaker, etc. For getting an API response, you can deploy a Microservice as well. I have created a function url from lambda that i can use in any of my app through backend like FastAPI, flask, Streamlit, etc.
I am getting error here- model = llm_pipeline() generated_text = model(input_prompt) print(generated_text) ValueError: The following `model_kwargs` are not used by the model: ['return_full_text'] (note: typos in the generate arguments will also show up in this list)
Depends..... AWS is the primary cloud provider. If you are working in IT you will probably work with AWS, Azure or GCP. AWS provides different ways of deploying this models like via one click deployment using DLC . And hourly rate or pay per go is quite affordable. But yes you have to select your options. Based on many things like data protection, privacy, governance, scaling, etc.
Great, video. Followed along until 48:46. Please go into depth on the Policies error and how you fixed it. I have no experience with AWS and got the same error but you skipped over why the error occurred or detailed instructions of how to solve it.
Hi, getting below error when creating a end point. can anyone help please. Error Message: " UnexpectedStatusException: Error hosting endpoint huggingface-pytorch-tgi-inference-2023-09-04-16-49-09-918: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint.. "
Can you check if you are using the right model? Do you need to authenticate with the Huggingface model repo? Pls look at the logs in cloudwatch in AWS console.
how to fix permissions error problem go to i am Access management/Policies and make new policy and create for s3, lamda, and sagemaker give access to all permission and save after that link the policy to your sagemaker project go to Access management/Policies select the policy you created then go to entitled attached and attach your sagemaker sand save problem fix
May be you have to login into Huggingface hub using your access tokens. Just do a login from notebook cell in sagemaker. Then you can deploy. FYI, DLC for official Llama2 is still not available for deployment. You can deploy manually or from jumpstart.
Hi, thanks for the video it teaches a lot. I just want to know, what is ideal notebook instance i can go to load and deploy starcoder 15B model? At first level, i tried with ml.g4dn.xlarge instance but i got "out of memory error".
I've got this error, how to solve it? Test Event Name generateTestResponse Response { "errorMessage": "'queryStringParameters'", "errorType": "KeyError", "requestId": "4196dde2-b2e7-4863-afa7-f2a67129021b", "stackTrace": [ " File \"/var/task/lambda_function.py\", line 10, in lambda_handler query_params = event['queryStringParameters'] " ] }
great video sir, I was looking something really like this for my client POC and fortunately landed to your videos. Thanks a ton for your efforts.
Glad to hear that... keep learning and growing.
Can anyone please say how much money it will cost for me for doing all this or is it free???
Excellent tutorial. Could you please give a tutorial how to feed a document and extract the answers from it and then deployment it. Thank you in advance❤
Great video Sonu. Thanks for sharing 🙏
My pleasure 😊
A very nice video.
A small suggestion - Maybe after the video, you should also show how to terminate all the 3 things - notebook, model and endpoint, so that people don't incur a lot of cost.
Keep up the good work!
Noted
if i do all of the things that shows in the video so it will cost me for 24 hours? if yes then how can i save my cost by only triggering it when sending a request and then terminate it?
Can anyone please say how much money it will cost for me for doing all this or is it free???
@@sravantipris3544 If you use initial free credits then it would not cost you any money. However, make sure to disable all the services immediately else it could go to USD 600+
great video brother..I was looking exactly for this stuff and luckily landed over your channel..keep up the good work
really love to see this I will definitely follow this video and get this done today only!!
Thanks Vivek for your kind words.....
I think I almost missed where you referred "AWS SageMaker DLCs", maybe possible to emphasize more on DLCs.
Great tutorial thanks. The only bit i got lost with is creating the policy to let lambda call the sagemaker endpoint. GPT4 helped :)
Glad to hear that ..... Thanks!
@@AIAnytime better hota bata dete wo policy kaise create karna hai ?
Great Video , i need some information for the cloud deployment of the llm model if deploy a llm model on aws sagemaker and used it using api gateway and lambda then how the amazon will charge me like whether the charge is for 24/7 or for the api hit only it will be very helpful if anyone can share their insights
what is the permissions we need to add in the IAM role, while we getting this "ACCESS DENIED EXCEPTION WHEN CALLING INVOKE ENDPOINT" in lambda function.
Was the API Gateway used for anything? Thank you for the video again! Very useful!
Great content and excellent tutorial! thank you
Glad it was helpful!
Excellent tutorial
Thank you! Cheers!
in the video I have 2 doubts.
1) 48:48 you have created some IAM > Policy like AWSlambdaBasicExecutionRole-30e....... and AmzonSagemaker-ExecutionPolicy...
how did you did that!!
2) 46:40 what is that "path" : "\example" can you please explain that!!
1. You can create policies in the IAM. Search IAM in search box. Open IAM, look for policies in the left hand side and go inside it add policies.
2. Path is typically related to the URL path of the incoming HTTP request, specifically when working with API gateway. Mainly, you will configure API gateway after Lambda function. That's why but you can ignore that. You can just define queryStringParameters. param1=query or something depending upon how you write your lambda code.
@@AIAnytime thankyou!!
Very instructive video, I would like to know if it is possible to upload the model directly to AWS without going through HF. Thanks you in advance.
Absolutely , you can do that. That will be a bit of manual deployment. Where you can push the model weights to S3 and using script you can deploy via Sagemaker. The good way is to deploy through DLC. I mean images.
how much is the monthly cost of keeping the service up?
it could go up to $500 or more. We need to terminate if we don't want to incur this cost.
Thanks got to know about how to increase output length using hyperparameters ....
Glad it helped
I finetuned tinyllama on my own dataset. Can I deploy my finetuned model with these steps that you mentioned in this video
Absolutely....
Very informative video... I have a query. There will be some provision to disable the notebook and endpoint when not in use right??
Yes, correct, Ashwani. You can control it completely. Stop the endpoint, delete endpoint, etc! You can also set limits on budget etc.
Error show when choosing 70b how do fix it this error showing jumpstart-dft-meta-textgenerationneuron-llama-2-70b-f
Something went wrong
We encountered an error while preparing to deploy your endpoint. You can get more details below.
operation deployAsync failed: handler error
How are you querying the model in Jupyter Labs prior to you ever deploying the model? I am confused by that. Amazing video, just want some clarification if possible. In addition, why is the instance type configured in the deployment code different than the T5.2xlarge you configured in SageMaker?
He first configured the notebook instance that was used to run the Jupyter Notebook code, etc. then later in the video he is configuring the predictor (i.e. inference endpoint) that will hold the model and can be called from AWS Lambda
Can anyone please say how much money it will cost for me for doing all this or is it free???
Hi, Thanks for the valuable video. My doubts are
1. How you handled error on lambda related to IAM policy whether it is specifically for accessing sagemaker endpoints here...
2. For getting api response, whether we do not need any flask or fastapi implementation.
Can you guide on this. waiting for ur responses and videos....
Hi Venkatesan, you have to attach the policies in IAM policy for lambda, S3, sagemaker, etc. For getting an API response, you can deploy a Microservice as well. I have created a function url from lambda that i can use in any of my app through backend like FastAPI, flask, Streamlit, etc.
@@AIAnytime Thanks for the kind response whether lambda is mandatory here or i can use inference endpoint from aws sagemaker directly in fastapi....
I am getting error here-
model = llm_pipeline()
generated_text = model(input_prompt)
print(generated_text)
ValueError: The following `model_kwargs` are not used by the model: ['return_full_text'] (note: typos in the generate arguments will also show up in this list)
i am also getting error here
I learned a lot here, thanks a lot! 🙌
Glad it was helpful!
How does it compare with hosting on a cheaper cloud provider or GPU such as Lambda Labs?
Depends..... AWS is the primary cloud provider. If you are working in IT you will probably work with AWS, Azure or GCP. AWS provides different ways of deploying this models like via one click deployment using DLC . And hourly rate or pay per go is quite affordable. But yes you have to select your options. Based on many things like data protection, privacy, governance, scaling, etc.
Thanks a lot brother! Means a lot
No problem
Great, video. Followed along until 48:46. Please go into depth on the Policies error and how you fixed it. I have no experience with AWS and got the same error but you skipped over why the error occurred or detailed instructions of how to solve it.
Hi, getting below error when creating a end point. can anyone help please. Error Message: " UnexpectedStatusException: Error hosting endpoint huggingface-pytorch-tgi-inference-2023-09-04-16-49-09-918: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint.. "
Can you check if you are using the right model? Do you need to authenticate with the Huggingface model repo? Pls look at the logs in cloudwatch in AWS console.
checkpoint = "MBZUAI/LaMini-T5-738M"
i am stucked on policy creation. anybody can help or have a guide how to create that policy?
how to fix permissions error problem
go to i am Access management/Policies and make new policy and create for s3, lamda, and sagemaker give access to all permission and save after that link the policy to your sagemaker project go to Access management/Policies select the policy you created then go to entitled attached and attach your sagemaker sand save problem fix
Great. Thanks for the detailed steps.
Great video ❤
Glad you liked it!!
Hi! I am trying to deploy llama 2 in sagemaker. Not sure how to use the HF tokens. The endpoint is failing saying that the repo is gated.
May be you have to login into Huggingface hub using your access tokens. Just do a login from notebook cell in sagemaker. Then you can deploy. FYI, DLC for official Llama2 is still not available for deployment. You can deploy manually or from jumpstart.
Can anyone please say how much money it will cost for me for doing all this or is it free???
That Greath Video, Thank
You're welcome!
@@AIAnytime can your deploy huggingface model on azure please 🙏
Very soon. I am working on it.
谢谢!
Thank you so much for the support.
Unable to understand clearly because of video quality, please provide high quality video.
Sure... Thanks for the feedback
How can u fine tune this model with your own data ?
1. Prepare the data in Alpaca format 2. Spin up a machine like g5.2x large or above 3. Fine tune using PEFT and QLoRA
@Thanks for Sharing Sonu!!!!!
My pleasure!
Can anyone please say how much money it will cost for me for doing all this or is it free???
just great .....
Thank you!
Bro I am not able to add the policy, can you help?
Why are you not able to attach policies? Can you open an issue on GitHub repo of this video and put some screenshots so I can help you debug?
where is the code?
How to fix "Internal Server Error" ?
Can you paste the error complete trace?
where it code repo on your girhab?
This should be
When will you begin to look like your Avatar photo? 😝
Haha... I usually keep like that. Let's c 🔜
Is this serverless?
Yes it is.....
Hi, thanks for the video it teaches a lot. I just want to know, what is ideal notebook instance i can go to load and deploy starcoder 15B model? At first level, i tried with ml.g4dn.xlarge instance but i got "out of memory error".
How about on Google Cloud
Very soon.....
I've got this error, how to solve it?
Test Event Name
generateTestResponse
Response
{
"errorMessage": "'queryStringParameters'",
"errorType": "KeyError",
"requestId": "4196dde2-b2e7-4863-afa7-f2a67129021b",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 10, in lambda_handler
query_params = event['queryStringParameters']
"
]
}
I'm getting the same error, did you find anything? @AIAnytime can you please check?