Meta llama 3 unexpected results! 100 prompt test
ฝัง
- เผยแพร่เมื่อ 24 เม.ย. 2024
- Download ollama from ollama.com/
Learn more about metas llama models: huggingface.co/meta-llama
This video is sponsored by ME! consider subscribing.
In this tutorial I put Llama 3 to the test! Have you ever wondered how often AI models like Llama3 get the right answer? In this video, we're going to find out by asking Llama3 the same question 100 times and analysing its responses.
I'll guide you through the code used for this experiment step by step, so you can understand how it works and even try it out for yourself. By importing the ollama library and setting up a multiple-choice question about the location of a cake, we'll interact with Llama 3 repeatedly to see if it consistently selects the correct answer.
Throughout the tutorial, you'll learn:
How to set up and structure your code for interacting with Llama3.
Strategies for analysing and interpreting the responses given by AI models.
Tips for conducting your own experiments and evaluating AI performance.
Join me as we dive deeper into the world of AI testing and discover just how accurate Llama 3 can be!
Don't forget to like, share, and subscribe for more Python, web scraping, data and AI experiments and tutorials.
The AI once responded with: "C it is in both rooms", thats what I thought hearing the question lmao
7:39 for anybody wondering
Which can be the correct answer.
If you put a plate on top of a cake, most times you'd have pieces of the cake sticking on the bottom of the plate.
Sure the *cake* would be in the dining room, but some parts of it would be in the kitchen.
It depends on the context of the question.
Thanks for these videos man. One of the best in the fields for sure!
Ohh wow wow wow easy:))) thanks for video. Very interesting actually
Love your content man, keep it up❤
Thank you!
This is very interesting with the findings. What's also interesting for some people is the connections it has in place for finding the answer and having it wrong some of the time, and if you use a quantized model, does it drop the ability to hold that % of correct, does it make it worse, does it make it better in some instances.
My thoughts exactly, that was the 8B 4 bit quantized model on a MacBook Air… I suspect the 70B maybe wouldn’t face the same problems but I think I’m going to have to test that theory. Time to crack open the cloud compute I think!
As a human I'd instinctivelly answer that the cake technically is in both rooms since in most cases some pieces of the cake would stick under the plate.
Thought it's an overly technical point since some frosting/cream wouldn't constitute a cake.
Lateral thinking is intersting in the context of the LLM.
I'd be very curious on the path the prompt takes in the model latent space and what "node(s)" is/are at fault for the mistake.
I also tried this model. It would often hallucinate and print random letters in a repeating pattern over and over again
It's not without its quirks, multishot prompting may be the best way forward. I'm going to keep experimenting!
Dumb question! Is the model repeating the question with it's previous answers in context or is it providing each answer with its context clear of its previous answers? And would having the its previous answers in or out of context make any difference to the result?
Excellent question, it's a completely clear context window each time.
Great video, having said that, the volume is somewhat low.
Oh no! I’ll fix that in the next one
When you ask only for one letter response the model doesnt have time/space for thinking it through.. And also when you write the "please provide a one latter answer" AFTER then the last prompt is provide one letter a or b so small model will get confused and forget about the first part - maybe try to provide formating prompt at begining then pose question 🙂 keep digging
Good insight, oh I've been digging! Next video is leaving two models to have a conversation with each other.. Things get WEIRD!
@@MakeDataUseful Looking forward to that video.. if you will be willing to add url to the video to this comment, that would be great so I get notified :-)
I thought this is what function calling is for?
Python functions?
How the AI works, do we r assigned with some engine for every prompt? Bcoz using chatgpt i observed a strange behaviour like in a thread of chat all of a sudden a generic answer coming up e.g. how "may i help you" and then i told to see the previous replys and answer was "ah, now i understand... "