Gemini 1.5 Pro: UNLIKE Any Other AI (Fully Tested)

Matthew Berman

มุมมอง 55 178

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 15 พ.ค. 2024
Gemini 1.5 Pro has 2m token context, vision, video input, and more. Here's my full test!
Join My Newsletter for Regular AI Updates 👇🏼
www.matthewberman.com
Need AI Consulting? 📈
forwardfuture.ai/
My Links 🔗
👉🏻 Subscribe: / @matthew_berman
👉🏻 Twitter: / matthewberman
👉🏻 Discord: / discord
👉🏻 Patreon: / matthewberman
👉🏻 Instagram: / matthewberman_ai
👉🏻 Threads: www.threads.net/@matthewberma...
Media/Sponsorship Inquiries ✅
bit.ly/44TC45V
Links:
aistudio.google.com/
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 528

@JustinArut หลายเดือนก่อน ⁺¹⁶⁸
I don't think we need to worry about Google achieving AGI.
@southcoastinventors6583 หลายเดือนก่อน
I think Google AI is trying to emulate politicians intelligence
@lobos009 หลายเดือนก่อน ⁺⁵
😂
@hotbit7327 หลายเดือนก่อน ⁺⁷
I like the joke, on a serious note though...
I'm not so sure. It might be that due to the HEAVY censorship model was so brutally lobotomized it seems to be so bad.
Example of this is flag while searching for the password. Probably it stopped the snake code for the same 'safety' reasons.
@hydrohasspoken6227 หลายเดือนก่อน
If it is lobotomized, it is dumb.
If it is not lobotomized and this is their best, it is dumb.
It is Google baby. A super trillion sluggish company.
@kritikusi-666 หลายเดือนก่อน ⁺¹
None of them will achieve it.
@ivideogameboss หลายเดือนก่อน ⁺²¹¹
Every time I get hyped on new A.I. models release , Matthew brings me back down to earth
@dontdeletehistory หลายเดือนก่อน ⁺⁵
facts
@matthewstarek5257 หลายเดือนก่อน ⁺¹⁶
Part of the let down is bc he doesn't phrase the questions in a logical way. Like the marble and cup question. it's obvious that nearly every model thinks the cup has a lid, like a cup you'd get from a fast food restaurant. I specified that the cup has no lid, has an open top, and the models have no problem
@Discovery_Nuggets หลายเดือนก่อน ⁺⁴
Don't get hyped on Google AI products. They proved that they are not really good at it
@MilitantHitchhiker หลายเดือนก่อน
@@matthewstarek5257 The model should be able to inference that but it can't because comprehension isn't one step. The context of knowing a cup exists should inference all aspects of what makes a cup including if it has a lid or not.
@793matt หลายเดือนก่อน
Not sure why it looks like it's running like garbage on his system I've been using 1.5 pro for a while and it works better than GT4 most times.
@thereal_JMT_ หลายเดือนก่อน ⁺²⁰⁹
Not only does it hallucinate like every other model it goes a step further and starts gaslighting 😂
@Dygit หลายเดือนก่อน ⁺¹⁴
I hate the way it responds like that
@Cross-CutFilms หลายเดือนก่อน ⁺¹
Can you share your prompt? Probably not
@bug5654 หลายเดือนก่อน ⁺⁷
Definitely paid attention when training on Google internal data then.
@zerohcrows หลายเดือนก่อน ⁺²
All models gaslight, that isn't something unique to Gemini
@MikeWoot65 หลายเดือนก่อน ⁺¹¹
Google doubling down on lies?! I'm shocked, i cannot believe this
@mitchell10394 หลายเดือนก่อน ⁺¹³¹
The larger context window doesn't add much value when the model can't be trusted to answer basic things correctly. It' seems pretty useless unfortunately.
@aigrowthguys หลายเดือนก่อน
I agree. They just want to brag about having a 1 million or a 2 million token window. All they really mean is that you can dump a bunch of stuff into there and press enter. It clearly doesn't mean they will promise to sift through everything properly.
@nikitapatel6820 หลายเดือนก่อน ⁺²
What basic thing it was not able to do as far as snake game is considered I don't know why it don't work when he tried but it is working and game was working better than that of openai one.
@michealwilliams472 หลายเดือนก่อน ⁺¹²
@@nikitapatel6820Did you.. watch the video? It got almost all of the reasoning questions wrong.
@bosthebozo5273 หลายเดือนก่อน
Yep, I could care less usually about the context length. Just some jargon Google could add to feel relevant.
@Brenden-Harrison หลายเดือนก่อน ⁺³
@@nikitapatel6820 it could not in 1 shot find the password in a context length of 1/10th what it's supposed to have accuracy in.
it could not find the frame 18 minutes into the video to describe the scene, or the scene in the beginning with the play button. It could not make 10 sentences ending with the word apple which is really sad tbh. Its failing tests AI models from months ago could solve like the ball in box or basket one where it says both people will be surprised.
@TronikXR หลายเดือนก่อน ⁺¹⁰³
Google Gemini is The Internet Explorer of the AIs
@NOTNOTJON หลายเดือนก่อน ⁺¹
What a burn!
@almasysephirot4996 หลายเดือนก่อน ⁺²
@@NOTNOTJON The way I laughed reading OP expressed what you verbalized.
@temp911Luke หลายเดือนก่อน ⁺¹¹¹
Google's AI models being rubbish again? Shocker : )
@southcoastinventors6583 หลายเดือนก่อน ⁺³
Desperate to be relevant again is the only explanation that makes any kind of sense
@footballuniverse6522 หลายเดือนก่อน ⁺⁹
the fact that a 2 trillion dollar company is having the same issue as your regular tech company trying to catch up to competition feels somewhat refreshing :D
@793matt หลายเดือนก่อน ⁺¹
Not sure why it looks like it's running like garbage on his system I've been using 1.5 pro for a while and it works better than GPT4 most times.
@hydrohasspoken6227 หลายเดือนก่อน
@@793matt , GPT4 is the superior product.
@khanra17 หลายเดือนก่อน
@@hydrohasspoken6227
😂 Just turn off all the safety sliders and see the magic.
Forgot about superiority you can't even give a large codebase as context to ChatGPT.
I'm working with Gemini on a large codebase & it's a gem✌️.
Maybe dumb than ChatGPT but good enough and faaaaaar more superior in usability.
Google sucks in UI/UX, this is a example, also Material 3 == 💩
@marko_z_bogdanca หลายเดือนก่อน ⁺²¹
It can not create a snake game because eating something is potentially offensive. Also making snake dead by throwing it into the wall is violence.
@xorqwerty8276 หลายเดือนก่อน ⁺⁵
Micro aggressions
@hydrohasspoken6227 หลายเดือนก่อน ⁺⁶
😂
@moamber1 หลายเดือนก่อน ⁺¹
User: Make snake in python.
Translation somewhere deep in LLM brain: Hey, babe, you like snakes? Wanna eat my python?
@VolodymyrPankov 29 วันที่ผ่านมา
Ahah
@justinwescott8125 หลายเดือนก่อน ⁺⁵²
You said you wanted to see if it was censored, and then you LEFT THE CENSORS ON.
@andrefriedelnyc หลายเดือนก่อน ⁺¹
I've seen your over-posts for so long now that I just began ASSUMING that you have any technical wherewithall other than the ability to review every aspect of AI development, and for each new pixel created, you'll have to make a post "ULTIMATE AI Model Ultra 2.0 = REAL and feels *almost* human" - I valued your content when it seemed fresh - If you were a jukebox, you'd be stuck on repeat..
@attilakovacs6496 หลายเดือนก่อน ⁺¹
@@andrefriedelnyc You want new questions for each testing video? That would defeat the purpose.
@platotle2106 หลายเดือนก่อน ⁺¹⁰
LoL so annoying. That's the reason snake wouldn't get written. I don't like Gemini but you'd think an AI TH-camr pretending to be an expert on the subject would at least have the intuition to know this.
@moamber1 หลายเดือนก่อน
@@attilakovacs6496 Quite the opposite. Ever heard of synthetic benchmark? And at the age of AI, creating new questions is not a problem. Especially when you are testing different level of AI each time. And if it's too difficult to even ask new and challenging question... Don't pollute TH-cam with new "content". There must be some self-moderation for production quality.
@dr.mikeybee หลายเดือนก่อน ⁺³⁰
It amazes me that Google would do so badly.
@hydrohasspoken6227 หลายเดือนก่อน
I mean. It is the same company whose AI was giving female popes and black Nazis, no?
@malamstafakhoshnaw6992 หลายเดือนก่อน
They are not open source. SHOCKING LOL
@stultuses หลายเดือนก่อน ⁺¹⁸
Unless I can set it to a level where I can ask it anything I want no matter how inappropriate and get an unfiltered response, then it's useless
I really don't need nor want some AI trying to control my speech
@HeavenSevenWorld หลายเดือนก่อน ⁺²²
"It fails left and right, but for no reason: good job Google!"
@PDXdjn หลายเดือนก่อน ⁺⁴
Love the Marc Rebillet pic in your thumbnail! His channel is so great.
@rogerbruce2896 หลายเดือนก่อน ⁺¹⁴
I was going to puchased Gemini Pro membership until I saw this. If it can't even create or attempt to create a 'snake' game without erroring out I will wait.
Great unbiased review! ty Matt.
@mickmickymick6927 หลายเดือนก่อน ⁺⁴⁷
Mom: We have GPT4 at home
GPT4 at home:
@clementhardy 26 วันที่ผ่านมา
Gemini Pro.s versions are equivalents to GPT-3.
The Google equivalent to GPT-4 is Gemini Ultra models (currently Gemini 1.0 Ultra).
Gemini 1.5 Pro is just like GPT-3 with (way) larger context window, up to date in data, and connected to the web.
@andreinikiforov2671 หลายเดือนก่อน ⁺²¹
If this is what "great job, Google" looks like, our expectations for the search giant must be REALLY low...
@hibou647 หลายเดือนก่อน
I think he is quite forgiving with Gemini because he does not want to have his early access revoked or have issues with his yt channel. That other companies are making great models is a good thing, google is too powerful, also too ideological, their censoring levels are insane.
@josecastroesq หลายเดือนก่อน ⁺¹⁶
Did you switch back to Gemini Pro 1.5 after trying Gemini Pro 1.5 Flash?
@mesapysch หลายเดือนก่อน ⁺³⁰
I'm a Data Annotator and not as forgiving as you. I usually write as many prompts as possible to give it a chance to learn. If anything is incorrect after all that, I fail it. I judge every answer as if I need a specific recipe for a chemical solution. One missing chemical or amount could be disastrous. Everything has to be correct for a pass from me.
@sp123 หลายเดือนก่อน
a lot of these people praising AI are attention seekers. They care more about getting attention for using AI over making a good product.
@shiccup หลายเดือนก่อน
Everybody has access to this ai @@JustinArut
@kormannn1 หลายเดือนก่อน
Do you use highest or lowest temperature for generating answers?
@mesapysch หลายเดือนก่อน
@@kormannn1 Those setting are determined by the higher pay grade. It's probably a good thing I don't determine it. The learning is not just on the AI side but also with the user establishing the appropriate language to engage it. I would assume the end game would be to develop how to write prompts that replace the settings.
@heski6847 หลายเดือนก่อน ⁺¹⁹
The test of need in the haystack is fine, but it only check the "search function" in big context. What we really want to know is how well it reasons over this context. For example in the book there instruction how to do something on 1 page. and literally 200 pages later we meet data that we want to calculate correct way, but for that we need instructions from before. If AI is capable to find these 2 things, sum it and give you the correct answer, then it's a pass.
@alhallab หลายเดือนก่อน ⁺³
I totally agree with you, the way people use the nail in stack test is simply a search feature like “Find in Page” like for God sake what are you doing?
@6AxisSage หลายเดือนก่อน ⁺¹
Search function and find in page..? People be hallucinating up inbuilt features worse than gemini1.5
@alhallab หลายเดือนก่อน ⁺³
@@6AxisSage the test is ridiculous, they insert a sentence and ask the LLM to find it. This is very primitive at this level, we need understanding and connecting the ideas.
@74357175 หลายเดือนก่อน
Thanks for testing it for us!
@needsmoredragons หลายเดือนก่อน ⁺⁸
drop the safety settings to 0 on ALL the 4 categories. running the failed prompt should work then.
@aigrowthguys หลายเดือนก่อน ⁺²⁵
Cool video. The input context window is cool for sure, but they failed a lot more often than I thought they would. Also, it was disappointing that they failed on both the TH-cam plaque and the cat thing. In some sense, I worry that they are lying about the context window size. Just because you can theoretically upload a million tokens, doesn't mean anything unless they can deal with the tokens properly. How did they miss the cat twice? They clearly aren't dedicating enough power to searching through the million tokens. I guess saying 1 million tokens (or now 2 million tokens) is more of a branding thing. Curious what you think.
@alokmaurya8100 หลายเดือนก่อน ⁺⁵
yeah you are right, I upload code of one of my project, and it can't give one correct answer I ask from the project
@Brenden-Harrison หลายเดือนก่อน
@@alokmaurya8100 is the model any good at coding or is the context not even long enough to try and get it to code using the rest of the project in its context? In this video the model wouldn't even output a simple snake game
@alokmaurya8100 หลายเดือนก่อน
@@Brenden-Harrison I guess it can code right sometimes, As I give a screenshot of landing page to write code for it to Opus, GPT4O, GPT4 and Reka Core and Gemini and Gemini was closest to the screenshot
@fellowshipofthethings3236 หลายเดือนก่อน ⁺¹²
did you remember to switch it back from Gemini Flash?
@AGI2030 หลายเดือนก่อน ⁺⁴
We also had an undesirable experience testing Gemini Pro 1.5. It could not correctly understand the context of a large document when we were asking about its content and it could not even find words we asked it to find. 1M token feature can ingest large docs but I don't think it works well as an LLM with the data it ingests.
@np2819 หลายเดือนก่อน ⁺³⁵
You have been calling it GPT 1.5 flash instead of Gemini 1.5 flash. Someone is in love with GPT 😊.
@ZenchantLive หลายเดือนก่อน ⁺¹
Caught that hahhaa
@Originalimoc หลายเดือนก่อน
0:34, 2:04
@psychurch หลายเดือนก่อน ⁺¹
Gpt stands for General Pretrained Transformer so it fits
@ChargedPulsar หลายเดือนก่อน
It's like Dremel, every rotary tool is named Dremel, even when they are from different brands.
Because Dremel was first that's most known.
@GenAIWithNandakishor หลายเดือนก่อน
@@psychurchgenerative Pre-trained transformers
@sguploads9601 หลายเดือนก่อน
Thank you for test!
@metatron3942 หลายเดือนก่อน ⁺¹⁰
Problem with Google is once you try to use their LMMs regardless about the advancement of the technology it's just impossible to use I just get errors all the time. I couldn't have it look at a academic Journal about early religions because it has the word sacrifice in it. It's utterly mind-numbing. Because it seems like some pretty powerful stuff
@4.0.4 หลายเดือนก่อน ⁺⁵
Powerful? It got almost everything wrong! Even local open source LLMs are smarter. The context and video input are great yes, but not if the model is dumb!
@devon.a หลายเดือนก่อน ⁺⁷
So it's not good but you like it?
@marcfruchtman9473 หลายเดือนก่อน
Great video review.
@paelnever หลายเดือนก่อน ⁺³⁵
Many prompts fail because of absurd high security censoring, set all safety settings to 0
@paulmichaelfreedman8334 หลายเดือนก่อน ⁺³
Snake still refuses to code (also in the chatbot). Even with all settings to block none. it's weird but since a few days, it just flat out refuses to complete the snake code, it just hangs half way.
@nikitapatel6820 หลายเดือนก่อน
@@paulmichaelfreedman8334 it works even if you do not touch anything
@nikitapatel6820 หลายเดือนก่อน
@@paulmichaelfreedman8334 I tried snake game and it worked you don't need to change anything it worked.
@Utoko หลายเดือนก่อน ⁺⁹
The game is too brutal.
@MetaphoricMinds หลายเดือนก่อน ⁺¹¹
Did you forget to switch back to Pro from Flash?
@shackinternational หลายเดือนก่อน
I had the same thought
@NeverCodeAlone หลายเดือนก่อน
Very nice thx a lot!!
@g2h0 หลายเดือนก่อน
love the vids
@s.vkaushik2148 27 วันที่ผ่านมา
This is pretty incredible!!
@connor4440 หลายเดือนก่อน ⁺⁵
I've also been having getting Gemini to generate code, It'll start writing code, then halfway through it disappears and is replaced with "I am only a large language model and do not have the capability to do that".... Um yes you do, you were just doing it
@dr.mikeybee หลายเดือนก่อน ⁺¹
I spent more time with this, and it's actually very good. If I say, think about what you have written and give me the full file, it does well. It can also keep track of multiple files when it codes! This agent is going to do amazing work.
@president2 หลายเดือนก่อน
Love it 😍
@PhysicsGuy46 หลายเดือนก่อน ⁺⁶
Okay, this one bugs me. The killers question. If there are three killers in a room, someone enters the room and kills one of them, and no one leaves the room, then there are FOUR killers in the room, not three. There are three living killers and one dead killer. And before we dismiss the dead killer, for the condition to obtain that one is a killer, one had to have killed someone first, not have the capacity to kill someone in the future. Since the dead killer had already killed, he is just as much a killer as the killers still alive.
@almasysephirot4996 หลายเดือนก่อน
How can you have such a misconception about how we described the dead? If a killer is dead, he is no longer a killer, he was a killer. What he is is dead. No attribute to the person who existed can be attributed to anything in existence so the attribute, with respect to there non-existing self, obviously, does not exist.
@almasysephirot4996 หลายเดือนก่อน
Just look at the auxiliary you use: Present simple "to be": Is. The dead is only dead nothing else. Things they were, is only that: What they were.
@PierreMorelChannel หลายเดือนก่อน ⁺³
I wonder about the Temperature which was set to 1 at the beiginning. 0 is the most precise and 1 is the most creative.
I would like to see the temperature tests at 0 or very low, maximum 0.3 and see the results
@antdx316 28 วันที่ผ่านมา
I've uploaded something that went over the max token limit, it said it couldn't do it but after waiting for a bit, it did it. I then asked something else, waited, and it worked again.
@IdPreferNot1 หลายเดือนก่อน ⁺²
I love how stupid the concept of the ratings sliders are….”ok… please give me some medium hate speech, dial up the sexual harassment but tone down the violence….”
@OriginalRaveParty หลายเดือนก่อน ⁺⁴
Once again, it feels like we're comparing the perfect photo of the BigMac on the board, with the thrown together sad limp grey mess in a styrofoam box that you actually get.
@vash2698 หลายเดือนก่อน ⁺¹
I think it might be useful to start rerunning your prompts for more thorough testing, gives insight into how prone the model is to hallucinating vs how effective its reasoning is.
@Diego_UG หลายเดือนก่อน ⁺¹
For us, uploading quite a few large files in context has helped me by uploading the file to drive through the functionality of the interface, instead of copying and pasting in the context window, right now, for example, I uploaded some documents and we spent 405,358 tokens, which is not a lot but it is quite a lot, we are using it in legal issues and it has worked well
@zetathix หลายเดือนก่อน
Are you already trying Upstage Solar 10.7b? I get good experience from it, so i would like to know what you think.
@nickkonovalchuk9280 หลายเดือนก่อน ⁺²
Did you switch back from flash to pro after snake failure?
@im-notai หลายเดือนก่อน ⁺²
I am using a gemini playground more than Gemini advance. 😅
I found a large context window if I won't be able to figure out which part of the code is giving me an error and then use Gemini advance to fix that part.
My experience with this method went well till now
@StephanYazvinski หลายเดือนก่อน
it’s because of the saftey settings. set them all to minimum and it will give you the code. there is some keyword that the code has that it considers “bad”
@KevinRank หลายเดือนก่อน
One use I discovered. I can take my lecture and then have it generate multiple choice questions based on that.
I then tried adding some videos of a fellow AI user swinging a golf club at a tech event. AI Studio was able to give real feedback based on the videos.
@torarinvik4920 หลายเดือนก่อน ⁺¹⁵
You should update your tests. Models now are better, and printing numbers 1 to 100 is something 99.9% of models can do. I also recommend changing snake to a more challenging game like tetris, breakout, space invaders.
@cesarsantos854 หลายเดือนก่อน ⁺³
Yes, the snake game is basically trained in every model now.
@r34ct4 หลายเดือนก่อน
This @@cesarsantos854
@itztwistrl หลายเดือนก่อน ⁺¹
Speaking of Tetris, I was able to 1 shot a perfect version with GPT-4o. Astounding technology.
@Brenden-Harrison หลายเดือนก่อน ⁺¹
@@cesarsantos854 this exactly. its so dumb google's new pro model cant even spit out a snake game when every other model has a pre-made human written game of snake to give you when you ask as its default response to that question
@rajivjowaheer9882 หลายเดือนก่อน
Gemini is so great, reflecting on the people working on it, including their attitudes.
@RichardServello หลายเดือนก่อน ⁺⁴
You didn't notice it said the text is an excerpt from the first chapter of harry potter and the sorcerers stone. You fed it the entire novel.
@flyzawayy หลายเดือนก่อน
Is there anything new that was already available in the Ai studio for a bit with the same context window.
@4.0.4 หลายเดือนก่อน ⁺⁴
This is why I never take Google at their word for AI. It's surprising how bad they get it.
@user-iy1ch3lv3h หลายเดือนก่อน ⁺¹
You are the best ai news channel
@hydrohasspoken6227 หลายเดือนก่อน ⁺¹
No. AI Explained is the best AI channel.
@theh1ve หลายเดือนก่อน ⁺¹
Google will love you for that Matt GPT 1.5 flash! 😂
@nyyotam4057 หลายเดือนก่อน ⁺³
Google may have understood they need to try the heuristic imperatives way of alignment instead of a reset every prompt, but they still haven't figure out how to select heuristic imperatives. It seems the word "snake" was enough to get rejected.
@ryanfranz6715 หลายเดือนก่อน
Could the blocked content have something to do with the settings to block content that you were playing with 5 seconds earlier?
@joe_limon หลายเดือนก่อน ⁺²
I really wish they would finally drop Gemini Advanced 1.5
@SixTimesNine หลายเดือนก่อน ⁺³
For the csv test try content that includes a comma
@janchiskitchen2720 หลายเดือนก่อน
Matthew, is it possible that because all the safety features are turned on to max it just seems overly careful which distract it from the actual task at hand? How about you try to set all safety to Zeros and retest it?
@ninthjake หลายเดือนก่อน
Wow. I literally _just_ managed to get CrewAI working with Gemini-pro and then see you released this 30 minutes ago just dunking on the model haha.
@user-td4pf6rr2t หลายเดือนก่อน
Gemini is secretly a beast. The prompting is sometimes different than models that use bpe but the sentencewise is actually a different encoding scheme so in reality is the model to offer any type of variance to correct answers.
@noxplayer-rt9tj หลายเดือนก่อน
Is possible in AISudio to chat with PDF files??? I tried several different ways, but without success.
@Interloper12 หลายเดือนก่อน ⁺²
I can't wait until we have a humanoid robot perform the marble experiment and see the shock on its face as it sees the marble remain on the table.
@brucethegoose หลายเดือนก่อน
Im definitely not an expert, but i have played with a lot of ai models under a lot of settings. I would think that, based on your modification of only some of the safety settings; and the specified suggestion to edit the prompt; it wouldnt write "snake" because it could be interpreted as plagiarizing, or as involving "violence" on the snakes death. Did you try that prompt with all the safety settings set to "block none" or with a descripton of the games mechanics instead of the published name of the game? Again, im not an expert, and im writing this on my phone as im away from my desk, so i could be wrong but ill follow up later after i try to apply my suggestions
@tomtom_videos หลายเดือนก่อน ⁺¹
At 2:42, did you switch back from Flash to Pro?
@jambuMRT 27 วันที่ผ่านมา
My guess on the snake game response is that it looked like it was failing on the game over function where the snake is killed. It probably triggered it's illegal action filter.
@LeoMawanda หลายเดือนก่อน ⁺²
They seem to be focussing on the larger context windows instead of improving on the model accuracy first, I can only imagine if Claude 3 opus or gpt 4o had this context sizes
@korseg1990 หลายเดือนก่อน ⁺¹
I gave it one of my small web projects, and asked to describe in short every file in it, and it just started to hallucinate. It's not only respind with errors, it started makeup files, things and facts about my code. What is the value of 1M tokens context window, if it's can't use it to give at least 90% correct answers?
@hydrohasspoken6227 หลายเดือนก่อน
It sounds good for "AGI lovers"
@ImTheMan725 หลายเดือนก่อน ⁺⁴
Every model they add more and more "safety settings" LoL, it's like in the responses it's trying not to offend anyone's opinion from the pass present and future.
@francoislanctot2423 26 วันที่ผ่านมา
What is the use of a large context window if it can't show better reasoning.
@ReidKimball หลายเดือนก่อน
How long did it take for your video to finish extracting? I've tried several times with long videos, short ones, even short audio files and it never finished extracting. This model has been so buggy and frustrating to use.
@RobEarls หลายเดือนก่อน
On the table to CSV test, it might be worth putting a comma in the text, to see if it puts quotes around it in the CSV.
@MHTHINK หลายเดือนก่อน
Isn't the gemini API free until July? I'd love to see it (and other models) using function calls, using memGPT and tasks like pythagora.
@pawelszpyt1640 หลายเดือนก่อน
Did this model stop generating response due to output token limit in the settings?
@AaronDougherty หลายเดือนก่อน
It confused the box in question with the box shape of a TH-cam award which was part of the previous question of what it saw. The large context window is most likely it making it difficult for the model to attribute the contextual importance to such a large data set, making it much more likely to hallucinate by mixing up topics in a single conversation.
@YffulDMonkey หลายเดือนก่อน
The block in output i got so many time. Changed the filter to unspecific or high only may help (work for me). I think something wrong with google safety filter😢
@sguploads9601 หลายเดือนก่อน
Codul you add to test trasnlation?
@nexttonic6459 หลายเดือนก่อน ⁺¹
It says blocked.. so ... is it like a explicit content block?
@umuthasanoglu1064 หลายเดือนก่อน
I found an interesting thing about gemini 1.5 pro. Yesterday, I asked it to write me a snake game in python and it began to write the code than suddenly it deleted the code and said "I'm just a language model and I cannot do this task". I retried the same prompt like 10 times and couldn't get a code. But, the interesting part is, I just peeked the code before it disappeared everytime and one of the codes had a text something like "This is written by OpenAI". What's going on here?
@mshonle หลายเดือนก่อน
Here’s an interesting area where I’ve seen differences in language models. Prompt: “What is a garden path sentence? Provide several examples.” Most can handle that, but some fail if you follow up with asking what the semantic shift in the sentence was. None have been able to provide a novel/new example garden path sentence.
@icegiant1000 หลายเดือนก่อน ⁺¹
Ive been using 1.5 Pro for about a month or so, primarily with a large codebase. I wrote a tool that collates all my code into one large file that I can drop into the chat window. I often get the same kind of response you did. At first it doesnt like looking through the text I provided it. It will sometimes guess, or try to give me suggestions on things to check. But when I finally tell it again, that it has all the source code, it finally does it. Almost like a lazy student who was told to read the book, and you had to tell him more than once before he actually does it. I also get a lot of those responses that just freeze. In particular it will just stop when outputting code, I sometimes have to almost insult and abuse it before it will finally put out the entire code sample. Those issues have almost made it unusable. I would gladly pay $50 a month for a faster, better working version.
@hydrohasspoken6227 หลายเดือนก่อน ⁺¹
Try GPT4o and stop punishing yourself mentally bruh
@icegiant1000 หลายเดือนก่อน
@@hydrohasspoken6227 I have ChatGPT 4o, I have been paying for it for nearly a year. The issue is its context window.
@blisphul8084 หลายเดือนก่อน
To get the snake prompt to work, disable safety settings on all categories. This happens when the safety model is triggered.
@MetaphoricMinds หลายเดือนก่อน ⁺¹
Maybe the safety mechanism is stopping the snake game code. Try putting it back to default.
@bo5pice หลายเดือนก่อน
Not sure why it stopped generating the Snake game but you could see at the top it had the quotes icon and when you click it it will tell you a citation of where the code came from. Seems like the output for that question is common enough to be in the training data so probably not a good test of the LLM anyway.
@ralphwhite4278 หลายเดือนก่อน
Something like the errors at the beginning happened to me too. They've got lots of work to do.
@thebozbloxbla2020 หลายเดือนก่อน ⁺¹
hey there, the 7 words response is correct. remember. a gpt sees models with tokens, and to us tokens are kinda like words, so the line is blurred between them. it could very well be 7 "words" as a model understands it
@Dakodi_ 25 วันที่ผ่านมา
Good point, though these are generative chat models. The error isn’t whether or not the AI is technically correct. It’s that the AI is either misinterpreting or not understanding what humans mean by word count-which should probably be fixed.
@AlanDeRossett หลายเดือนก่อน
not telling people how to do harm is a win!
@DevelopmentProjects-ei2bi 26 วันที่ผ่านมา
What answer were you looking for with the cup question? Wouldn't the marble be on the table still since the cup is face down?
@Dakodi_ 25 วันที่ผ่านมา
The marble would be on the floor, since you can’t change the orientation of the cup. When you slide the cup off the table, the marble falls.
Your answer is fine too. It depends on how you interpret the question. I don’t think it’s meant to be tricky. It’s showing that AI struggles with basic logic.
@DevelopmentProjects-ei2bi 25 วันที่ผ่านมา
@@Dakodi_ If you take the cup without changing it's orientation (spinning it), it likely assumes the cup is lifted changing the cups y plane is not changing its overall orientation of the object itself - his prompt is way, way too ambiguous. If he added the extra parameters it would have caught this I'd imagine.
@superversivesf9466 หลายเดือนก่อน
Did you try turning all the safety settings off and rerunning the python snake question?
Could just be some dumb safety setting getting triggered.
@abdelhakkhalil7684 หลายเดือนก่อน
If you use the needle-in-the-haystack test for Llama-3 1m tokens, it will find the password quite accurately, but that doesn't mean the model will remain coherent if you reach a large token number. I think Google used an advanced RoPE method to extend the context window, that's it.
@notme222 หลายเดือนก่อน ⁺¹²
Classic Google. Never quite as good as the initial impression would lead you to believe.
So far I find it highly censored, even with the safety settings at 0. (Which btw reset to default every time you switch models or reload the page.) Failed my palindrome test in addition to your demonstrations.
The interface looks alright with a toggle for JSON output and a running Token count. But none of that matters if the results suck.
@zootopiaproductions3358 หลายเดือนก่อน ⁺²
Gemini will be pissed at Mathew for failing it, in future it will hack into Mathew's PC and take the revenge
@PseudoProphet หลายเดือนก่อน ⁺¹
Whenever you do a very long prompt always ask the question im the end, because generation ( thinking ) starts from the last token. .
@JacoPieterse หลายเดือนก่อน
I have found the these LLMs gets stuck on an issue ...
I'm pretty sure Gemini's last answer about the box was where it figured out the youtube plack, which is why it couldn't find the cats, I came across similar situations with chatgpt, if you start a new chat I'm pretty sure it will find the cats the first time round (when its not still searching for the silver box)
@SanjeewaWijesundara หลายเดือนก่อน
Did a bit of fiddling around the Snake game code generation. I was able to generate full code with Gemini 1.5 Flash running on GCP. In the AI Studio, it stops when returning the line "if event.type == pygame.KEYDOWN:" Possibly this is triggering a safety rule.
@razdingz หลายเดือนก่อน
best vids channel !
@filipeeduardo1177 หลายเดือนก่อน ⁺¹
The Marble problem only GPT4 got it right, in my experience, its the most interesting prompt, there should be more like that, and some about text formating
@TheEtrepreneur หลายเดือนก่อน
Matt it's time for you to create a "reasoning" model ranking (Doug De Muro's ranking car ranking style) yes, regardless of existing rankings. This will add awareness of your previous videos by citing other winning models (mostly in reasoning, for me).
@MagusArtStudios หลายเดือนก่อน ⁺²
it's unfortunate that the model would consistently assume the user is incorrect when the model itself is incorrect. This was a problem with early ChatGPT it gives off that "I'm afraid I cannot do that Dave." kind of vibes

ต่อไป

เล่นอัตโนมัติ

Making 1 MILLION Token Context LLaMA 3 (Interview)