IMPORTANT BILLING NOTE: It's very important to make sure you follow the SageMaker clean up process to make sure you don't encourage big charges on your AWS account billing. This link here will help you docs.aws.amazon.com/sagemaker/latest/dg/ex1-cleanup.html Deleting the domain and users will not be enough.
That is the default 200 response from the Lambda function. You need to change the default code in the Lambda function to the code I shared. Search for Hello Lambda in the function in AWS Lambda and you will see where it's coming from.
Thanks for this tutorial. One important thing we would like to know is: How much is it montlhy on AWS? Its kinda difficult to understand their price model.
thank you Rob for the tutorial, I really appreciate how you put it, so much clear and precise. I am a student and need to deploy LLM in AWS for the Uni project, I want to try doing from my own AWS account with free trier and trying to understand how much would that cost if I only need to deploy it and check with a few inputs (if I am talking about Llama-2-7b-chat) and don't want to end up with 400USD bills. do you think just the deployment process would cost much?
I’d avoid sagemaker if you want to do it cheaply. Take a look at an api from replicate or use ollama to run on a local machine. Sagemaker might be overkill for your project and the spending can get dangerous quick.
Thanks! One question , if I use the jump start models from aws for example llama 2 and start with the fine tuning so then it's my own fine tuned model? and I can use it for my own use cases ?
easily explained man thanks, my specific requirement is that i want to train the model periodicly with my own data how can i achieve this?? can u create a video around this query, thank you
Great Video , i need some information for the cloud deployment of the llm model if deploy a llm model on aws sagemaker and used it using api gateway and lambda then how the amazon will charge me like whether the charge is for 24/7 or for the api hit only it will be very helpful if anyone can share their insights
Hi Great video btw. I would like to ask if it's possible to make concurrent request using Sagemaker. Will it drastically increase the cost or the costing is based on the hours of usage?
I am new to AWS and Llamas, and this provided a great first insight. One question though - once we create a Domain, does it mean we will get charged immediately? I do not remember setting up an account and providing the credit card info, so I am a bit confused on how this is actually operating. Can you point me into some documentation which explains this? Thank you!
Thank you for your presentation. I clicked the Subscribe button, although I didn't delve into the video content. During your talk, I recall you mentioning the open-source LLM and discussing AWS pricing. This led me to prioritize a cost-effective solution that allows for scalability. Have you considered running an ollama model locally and setting up a tunnel with a port endpoint for a public URL? I appreciate any feedback you can provide." 😊
Thanks for the comment! Yes now I would go the local route as you mentioned. Unless you carefully manage it on AWS it’s going to be very expensive. What I have not explored yet is concurrency issues and scaling. But there seem to be lots of infrastructure startups like replicate that are solving these issues
You can try a larger instance as the simplest way to scale but be wary of the pricing and make sure to set some limits in your billing. Outside of that I'm not an expert on scaling these models so the team at AWS may be helpful
when deploy textgeneration-llama-2-7b, i am gettng below error. Any idea on this? Something went wrong We encountered an error while preparing to deploy your endpoint. You can get more details below. operation deployAsync failed: handler error
I keep getting the following error: { "errorMessage": "'body'", "errorType": "KeyError", "requestId": "8ed205b0-2401-4093-bd1f-96d83fead0f0", "stackTrace": [ " File \"/var/task/lambda_function.py\", line 14, in lambda_handler Body=event['body'], " ] } It does not like the Body=event['body'] What is going on?
To handle this error, you should ensure that the 'body' key exists before trying to access it. You can do this with a conditional check or by using the .get() method, which returns None if the key is not found.
Unfortunately not that I can see, but I'm not a SageMaker expert. Most documentation I found mentioned deleting the instance which I think is a bit of a pain in the butt. I'll be investigating other options so I'll come back with some updates soon .
Quick question. If I want to use it for an app that is used by a few hundreds or thousands of users then should I keep the sage maker up and running 24X7? Because I'll need to use it for getting the AI-generated responses right?
Dear, thanks for the video, but there is a jump back in minute 4, and the quality of images is very low, could you share with us the code of the lambda function?, I can not read it from the video.
No problem, I actually had put the code in the description of the video if you want to look just below the video you will see a link to code snippets. Let me know if you find it okay. You can also try putting the video resolution up to 1080 to see it more clearly.
Thank you for your support. I've a problem when I create a Deploy model to endpoint . I Have this message: Failed to deploy endpoint The account-level service limit 'ml.inf2.xlarge for endpoint usage' is 0 Instances, with current utilization of 0 Instances and a request delta of 1 Instances. Please use AWS Service Quotas to request an increase for this quota. If AWS Service Quotas is not available, contact AWS support to request an increase for this quota. Do you have idea that I solve that, please? I have free account. Is this the problem?
Thanks for this great tutorial! I'll like to change the content value for "user" but keep getting "Internal Server Error" message whenever the content is not "write me a tweet about conductors"
I'm not an author of the video, but let me help you -> the reason you are getting the Internal Server Error is most likely due to the Lambda Timeout. The default timeout is 3 seconds and the model might take more time to generate the response, In order to increase the timeout please go to the Lambda Function console, go to the configuration tab, in the configuration tab go to the "General configuration", click edit and modify the timeout (increase it to e.g. 30 seconds).
I agree. When I put this tutorial up there were limited ways to run LLms locally. There are cheaper options to deploy now or look at services like replicate
@@RobShocks loved your response and concern/gratitude to address me. I am dying to look at one tutorial to get the input from s3-cloudfront hosted website to work with bedrock using lambdaedge or cloudfront functions. i feel its fastest and cheapest way to work with.
I'm using Llama2 to build an AI chatbot based on a custom knowledge base (approx. 100 pages PDF) with AWS SageMaker as the backend. I expect around 2000 users/month and 5-7 simultaneous interactions. Each user will ask about 5 questions. What AWS instance size would be optimal for this usage? Also, can you tell me what will be the projected monthly cost considering these metrics? Any advice is welcome
I've followed every step, but it still doesn't work. Even the payload for making the request is different. The only variation in my configuration is the region (us-east-1) , but that should not affect the payload request
thanks @rob , i need a small help ,can you plz let me know , what is the recommended instance (for eg : ml.g4dn.xlarge) n if i want to build a chatbot lets say that should handle 1000 concurrent requests , can you plz suggest me the configuration for sagemake , i'm already good with serverless but i'm just worried about sagemaker part.
IMPORTANT BILLING NOTE: It's very important to make sure you follow the SageMaker clean up process to make sure you don't encourage big charges on your AWS account billing. This link here will help you docs.aws.amazon.com/sagemaker/latest/dg/ex1-cleanup.html Deleting the domain and users will not be enough.
fkin garbage never loads
2 hours and after clicking on "create domain" still loading
this is lame, LLM response without streaming ?
@@samirkumar7788 You can setup streaming if you want.
Thanks, Rob. That very helpful tutorial.
For some reason, I have been receiving "Hello from Lambda!"
That is the default 200 response from the Lambda function. You need to change the default code in the Lambda function to the code I shared. Search for Hello Lambda in the function in AWS Lambda and you will see where it's coming from.
thanks a lot for this helpful demo!! appreciate all your efforts and notes!
Great work as always Rob!
High praise from the Doctor. Thanks chief!! x
Thanks for this tutorial. One important thing we would like to know is: How much is it montlhy on AWS? Its kinda difficult to understand their price model.
this is amazing. thank you. exactly what i looking for
thank you Rob for the tutorial, I really appreciate how you put it, so much clear and precise.
I am a student and need to deploy LLM in AWS for the Uni project, I want to try doing from my own AWS account with free trier and trying to understand how much would that cost if I only need to deploy it and check with a few inputs (if I am talking about Llama-2-7b-chat) and don't want to end up with 400USD bills. do you think just the deployment process would cost much?
I’d avoid sagemaker if you want to do it cheaply. Take a look at an api from replicate or use ollama to run on a local machine. Sagemaker might be overkill for your project and the spending can get dangerous quick.
Thanks! One question , if I use the jump start models from aws for example llama 2 and start with the fine tuning so then it's my own fine tuned model? and I can use it for my own use cases ?
This is the best one so far. But can you please make one where how I can run these models on my windows.
easily explained man thanks, my specific requirement is that i want to train the model periodicly with my own data how can i achieve this?? can u create a video around this query, thank you
This is probably best done with RAG. You would not fine tune the model but have it access all your data. Checkout Langchain RAG for this
Thanks for this tutorial. Can we do all these using Amazon SageMaker Studio Lab? Can we find a guide for that somewhere? Thanks 🙂
excellent! exactly what i was looking for. thxs
Great Video , i need some information for the cloud deployment of the llm model if deploy a llm model on aws sagemaker and used it using api gateway and lambda then how the amazon will charge me like whether the charge is for 24/7 or for the api hit only it will be very helpful if anyone can share their insights
thank you very much for the tutorial, this was really helpful
Glad you got something out of it!
Thanks!
great tutorial , keep up the good work !
Hey, thank you for the video. However LLama2 is not available in my SageMaker, so I guess your video is obsolete
I'm still learning -- but why would i use sagemaker over bedrock? when do i pick one over the other? specifically for deploying LLMs?
Hi Great video btw. I would like to ask if it's possible to make concurrent request using Sagemaker. Will it drastically increase the cost or the costing is based on the hours of usage?
I am new to AWS and Llamas, and this provided a great first insight. One question though - once we create a Domain, does it mean we will get charged immediately? I do not remember setting up an account and providing the credit card info, so I am a bit confused on how this is actually operating. Can you point me into some documentation which explains this? Thank you!
Can you train the AI using your own dataset, and can you integrate it into an existing website?
You can train but generally computationally intense. Its easier to use a good LLM and then pair with langchain or do RAG.
Thank you for your presentation. I clicked the Subscribe button, although I didn't delve into the video content. During your talk, I recall you mentioning the open-source LLM and discussing AWS pricing. This led me to prioritize a cost-effective solution that allows for scalability. Have you considered running an ollama model locally and setting up a tunnel with a port endpoint for a public URL? I appreciate any feedback you can provide." 😊
Thanks for the comment! Yes now I would go the local route as you mentioned. Unless you carefully manage it on AWS it’s going to be very expensive. What I have not explored yet is concurrency issues and scaling. But there seem to be lots of infrastructure startups like replicate that are solving these issues
That’s cool!! Thanks
this does not work anymore. They changed the endpoint structure.
It works for me. I skipped the jupyter notebook part as they have given a ui to test the llama output.
Should we wait for tutorial on how to fine tune Llama2 on our own data and deploy it on AWS SageMaker?
I am interested in this as well
Did you find anything about this ??
Oustanding! But what's the point of spending lots of money with SageMaker inestead of directly using Groq's api with LLama? Thanks!
This video is months old. Now I would just use groq or replicate
Thx, I like your style!
Thanks Rob
Amazing video, How can we make this scalable? What if we need more capacity on the server side, any ideas?
You can try a larger instance as the simplest way to scale but be wary of the pricing and make sure to set some limits in your billing. Outside of that I'm not an expert on scaling these models so the team at AWS may be helpful
Nice tutorial!
when deploy textgeneration-llama-2-7b, i am gettng below error. Any idea on this?
Something went wrong
We encountered an error while preparing to deploy your endpoint. You can get more details below.
operation deployAsync failed: handler error
I keep getting the following error:
{
"errorMessage": "'body'",
"errorType": "KeyError",
"requestId": "8ed205b0-2401-4093-bd1f-96d83fead0f0",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 14, in lambda_handler
Body=event['body'],
"
]
}
It does not like the Body=event['body']
What is going on?
To handle this error, you should ensure that the 'body' key exists before trying to access it. You can do this with a conditional check or by using the .get() method, which returns None if the key is not found.
thank you so much!
Thanks for the tutorial! Can you scale down SageMaker to zero when it's not used?
Unfortunately not that I can see, but I'm not a SageMaker expert. Most documentation I found mentioned deleting the instance which I think is a bit of a pain in the butt. I'll be investigating other options so I'll come back with some updates soon .
Quick question. If I want to use it for an app that is used by a few hundreds or thousands of users then should I keep the sage maker up and running 24X7? Because I'll need to use it for getting the AI-generated responses right?
Yes, it would need to be on constantly, but I’d make sure to heavily optimise for high usage otherwise your costs will be very high.
Hello, please, help me.
How i can pass CustomAttributes='accept_eula=true' on header code? Help me, brother, please!!!!
Dear, thanks for the video, but there is a jump back in minute 4, and the quality of images is very low, could you share with us the code of the lambda function?, I can not read it from the video.
No problem, I actually had put the code in the description of the video if you want to look just below the video you will see a link to code snippets. Let me know if you find it okay. You can also try putting the video resolution up to 1080 to see it more clearly.
Thank you for your support. I've a problem when I create a Deploy model to endpoint
. I Have this message:
Failed to deploy endpoint
The account-level service limit 'ml.inf2.xlarge for endpoint usage' is 0 Instances, with current utilization of 0 Instances and a request delta of 1 Instances. Please use AWS Service Quotas to request an increase for this quota. If AWS Service Quotas is not available, contact AWS support to request an increase for this quota.
Do you have idea that I solve that, please? I have free account. Is this the problem?
Its not free man. A quota request means you have to pay
Thanks for this great tutorial!
I'll like to change the content value for "user" but keep getting "Internal Server Error" message whenever the content is not "write me a tweet about conductors"
I'm not an author of the video, but let me help you -> the reason you are getting the Internal Server Error is most likely due to the Lambda Timeout. The default timeout is 3 seconds and the model might take more time to generate the response, In order to increase the timeout please go to the Lambda Function console, go to the configuration tab, in the configuration tab go to the "General configuration", click edit and modify the timeout (increase it to e.g. 30 seconds).
Thank you 👍@@takecert
rob i have a doubt why to use this where we can use models - ondemand without pricing for instance as base models we pay for only tokens
I agree. When I put this tutorial up there were limited ways to run LLms locally. There are cheaper options to deploy now or look at services like replicate
@@RobShocks loved your response and concern/gratitude to address me. I am dying to look at one tutorial to get the input from s3-cloudfront hosted website to work with bedrock using lambdaedge or cloudfront functions. i feel its fastest and cheapest way to work with.
Thanks💞
Great content
Thanks a million, on for a chat some day in town if you want to grab a coffee.
I'm using Llama2 to build an AI chatbot based on a custom knowledge base (approx. 100 pages PDF) with AWS SageMaker as the backend. I expect around 2000 users/month and 5-7 simultaneous interactions. Each user will ask about 5 questions. What AWS instance size would be optimal for this usage? Also, can you tell me what will be the projected monthly cost considering these metrics? Any advice is welcome
hey..are u up with your project ? i have a similar kind of thing to do
Would depend on how many parameters (i.e. llama 7b vs 70b)
You reply all auto generated comments 😂. Real comments wait for your response 😏
Not sure if I understand what you mean? Are you seeing auto generated comments somewhere?
I've followed every step, but it still doesn't work. Even the payload for making the request is different. The only variation in my configuration is the region (us-east-1) , but that should not affect the payload request
Why don't i see Llama in the list of the Sagemaker model ??
change region
@@fabiomartinelli64 Changed mine to california US, didnt work
@@diederik6975 I ended up requesting it from Amazon to enable it.
thanks @rob , i need a small help ,can you plz let me know , what is the recommended instance (for eg : ml.g4dn.xlarge) n if i want to build a chatbot lets say that should handle 1000 concurrent requests , can you plz suggest me the configuration for sagemake , i'm already good with serverless but i'm just worried about sagemaker part.