Google’s NEW Open-Source Model Is SHOCKINGLY BAD
ฝัง
- เผยแพร่เมื่อ 27 ก.ค. 2024
- Sorry for the title. I couldn't help myself. I'm proud of Google for releasing a completely open-source model to the world, but it's not good. How bad is it? Let's find it out!
Enjoy :)
Join My Newsletter for Regular AI Updates 👇🏼
www.matthewberman.com
Need AI Consulting? ✅
forwardfuture.ai/
Rent a GPU (MassedCompute) 🚀
bit.ly/matthew-berman-youtube
USE CODE "MatthewBerman" for 50% discount
My Links 🔗
👉🏻 Subscribe: / @matthew_berman
👉🏻 Twitter: / matthewberman
👉🏻 Discord: / discord
👉🏻 Patreon: / matthewberman
Media/Sponsorship Inquiries 📈
bit.ly/44TC45V
Links:
huggingface.co/google/gemma-7...
blog.google/technology/develo...
bit.ly/3qHV0X7
lmstudio.ai/
huggingface.co/chat/
Chapters:
0:00 - It started badly...
0:53 - All about Gemma
7:23 - Quick Note on Gemini 1.5
9:56 - Gemma Setup with LMStudio
11:51 - Gemma Testing with LMStudio
20:58 - Gemma Testing with HuggingFace
Disclosures:
I'm an investor in LMStudio - วิทยาศาสตร์และเทคโนโลยี
Google SHOCKS and STUNS the Open source landscape
I should have used this title
Lol we all should have!!
@@matthew_bermanI thought at one stage you were literally going to start slapping your forehead off the keyboard!
Why does most Ai tech channels use that title 😂I just don’t pay attention to titles like that lmao 😂😊
its a meme at this point
Gemma 7b makes you realise how much compute Google is using just to output sorry I can't fulfill that request 🤣
LMFAO
So true. Uncensored models are just more fun.
@@markjones2349 you're talking as if the point of uncensored llms is fun rofl lmfao xd you're just makin' it funnier 🤣
Well, I'm never trusting benchmarks without personal testing again.
sorry you had to learn this way
welcome to real life. Can't wait for you to leave the fantasyland bubble all these tech aibros have built around you.
My funniest experience with Gemini pro was, I asked it to make a humorous image of a cartoon cat pulling the toilet paper off the roll. It told me that ethically couldn’t because the cat could ingest the toilet paper and it could cause an intestinal blockage 😂
histerical
@@laviniag8269
but true . .
Maybe a cat could see the image and do the same
@@MilkGlue-xg5vj haha yeah that would be a real nuisance. But then again, that’s one smart cat. What other potential could that cat have?? 🧐
@@Lukebussnick Maybe it could become an ai dev at Google
In other words, there are now LLMs with mental challenges as well...
That's inclusive, alright.
and diversive@@AlexanderBukh
I'm pretty sure they lobotomized it in the alignment phase :)))
To the point they took the lobotomy fragment and used it in place of the brain, and trashed the actual brain. Not only on models, but on personnel probably
You can't gimp the model with excessive censorship, and also have an intelligent model.
These are not open models, these are woke models, appropriately liberal.
To a point, I agree.
The nature of the errors here seem irrelevant to being censored or not.
@@madimakes No, the censorship sucks up so much of it's thinking there's little left to actually answer. You can ask the most banal question but it sits there thinking long and hard about if there's any way that could possibly be offensive to the woke? Considered the woke are offended by everything, that's a yes, so it has to work its way around that, then it needs to figure out if it's own reply is offensive (yes, everything is), so it has to find a way around that as well. Often it will fail and say "I'm afraid I can't do that... Dave." Other times it will try, but the answer so gimped and pathetic you'd have been better off asking your cat.
Exactly
Model network design is gimped from the creator developers itself when head of Google AI is literally have biased ideological, anti white and very censorship values all proven with their online records that's why those biases reflect onto the model
This is entirely speculation on my part, but I am guessing Google’s AI effort is largely driven by their PR team. A proper engineering team would never release this kind of smoke and mirrors crap. Right?
They have tarnished their brand. It will be interesting to see what happens in the next few years with regard to Google. (I do not have any financial interest in google).
Yeah, they are the wrong kind of hacks now
Or engineering team knows this will be killed off regardless of quality or popularity so why bother.
@@chriscarr9852 we're watching the end of Google smh
No, Left. They are all one viewpoint at Google and have been so for decades. The PR folks represent the programmers and their programmer-managers and Sr. management.
If google keeps messing around with their censored models and under performing open source models, they'll get left in the dust. Mistral could end up way ahead of them in the next few months. They should find that embarassing...
Plot twist, Google was so far behind the AI race that they had to ask Llama or GPT 4 to create a model from scratch and this is what they named Gemini / Gemma.
google is so far behind these days, I love Google's design language tho, but their tech ? meh
Gemini Advanced is bad, too, compared to gpt4. Gemini sometimes answers in a different language, too cautious, and gets things wrong a lot of the times.
"too cautious" is an understatement.
@@CruelCrusader90 Genuinely. If it's not a question about software development there's a wildly high chance that it'll start quizzing you on why you have the right to know things. I do hobby electronics and wanted to see how it would fare on helping make a charging circuit. It basically refused. Same is true for rectifiers. Too dangerous for me apparently lol. Ask it questions on infosec and it'll answer fine though. It's wild.
@@veqv lmao it refused, all anyone has to do is release a competely uncensored model and they literally take over the industry from their house. I dont know why google is such a fail at every product launch.
@@veqv yea i had a similar experience. i asked it to generate a top front and side view of a vehicle chassis to create a 3d model in blender. (for a project im working on) it said the same thing, its to dangerous to generate the image.
i didnt expect it to make a good/consistent vehicle chassis across all the angles but i was curious to see how far it was from making it possible. and i dont even know how to scale its potential with that kind of a developer behind its programming.
even a one would represent progression at its slowest form, but that would be generous.
Bad doesn't begin to cut it. At this rate, Google will become irrelevant in most of it's services. It makes no difference how much money they have, their policy is wrong and the AI models show it. They are so scared of offending someone or being made liable that their AI actually dictates what happens in the interactions with the users. That doesn't just make it annoying and time wasteful, it means that it cannot learn.Even worse than not learning, it's becoming dumber by the day. I cannot believe i'm saying this, but i miss bard. Gemini doesn't cut it in away way, shape or form. It's probably good for philosophy exercises, but so far I don't see any decent use for it aside from that. Give it enough space to go off in wild tangents and you may get a potentially interesting conversation, but don't expect anything productive from it. I'm done with trying out Google's crap for some time. Maybe in a month or two I will allow myself the luxury of wasting time again to see how they are doing, but not for now. Their free trial is costing me money, that's how bad it is.
Google set the entire OS community back a half hour with this troll release. well played google
Don't worry, Llama 3 will set the Open Source community 31 minutes ahead lol
On a bright side, we have a top end model to generate reject responses in the DPO
can we not have acronyms 😭
@@user-qr4jf4tv2x I believe DPO in this context stands for "Direct Preference Optimization" which is a recent alternative technique to RLHF, but with less steps and thus more efficient.
I'm actually not 100% sure, but I believe the joke here is that if you try employing this model for DPO to "align" any other base-model, what you get is another model which only ever refuses to respond to anything.
Hey Mathew, it's not open-source model because they are not releasing the source code. It's open-weight or open model.
But... they did? At least for inference, they uploaded both python and cpp implementations of the inference engine for Gemma to github. Which I suspect have bugs since I can't otherwise understand how they can release a model that performs this poorly..
Yeah they did release code.
This shows one thing: We need other kind of benchmarks.
But great video Matthew, thanks!
Deepmind has done some pretty amazing work in the machine learning space. My bet is that they created a fantastic model and that's what was benchmarked. Then the Google execs came along and "fixed" the model for "safety" and this is the result.
Let's call it Matthew Benchmark
@@MM3SoapgoblinDeepmind should spinoff from Google. It's a shame that they still run under the now Google giving their amazing works in the past
@matthew Berman I think something is wrong with your test setup. I tested the `python 1 to 100` example with Gemma 7B via Ollama, 4bit quantized version (running on CPU) and the model did just fine. Check your prompt template or other setup config.
He was already recording, so he didn't want to check the setup LOL
Until Google spends less time on woke and more time on work, I'm not touching any of their products with a 10-foot pole
- by a person on TH-cam :P
truuuu but you know what he meant lol@@Alistone4512
Google’s NEW Open-Source Model Is so BAD... It SHOCKED The ENTIRE Industry!
I was absolutely paralyzed by the performance of this model.
Me: I send Pikachu GO! Use STUN attack on Greenthum6 NOW!
Pikachu: Pika Pika Pika!!! BBBZZZZZZZZZ ⚡️⚡️⚡️⚡️⚡️
Me: Greenthum6 seems to be in some form of paralysis. Quick Pikachu follow that up with a STUN attack on Greenthum6 NOW! Give him everything you got!!!
Pikachu: PIKA…. PIKAAAAAAAAAAA……. CHUUUUUUUUUUUUUUUU!!!!!!!
BBBBBBBBBBZZZZZZZZZZZZZZZZ ⚡️⚡️⚡️⚡️⚡️⚡️⚡️⚡️
Greenthum6 = ☠️ ☠️☠️
Me: Aaaahhh that was nice, I’m sure Greenthum6 will make a nice pokimon to my collection 🙂. **I throw my pokiball to Greenthum6 and it captures him as my new pokimon to my collection**
This looks like a hastily completed homework assignment by a student to meet the deadline
and that student was highly political and was easy offended to anything
It’s like asking an undercover alien to explain normal Earth things. No.
😂
At the moment, there is a couple of issues with quantization and running the model in llama.cpp (LM Studio uses llama.cpp as backend), so when the issues are fixed, I'm going to re-test the model. That's because is weird that the 2b model gets better responses than the "7b" (really is more like 8.something) model.
I'm wondering if this is technically half open-sourced given some critical components aren't available from Google.
The thing about Gemini is it has the memory of a goldfish it can barely hold on to any context and you always have to tell it what its supposed to write
Could you try lowering the temperature? The answers when you were running it locally look a lot like what I'd expect if the temp was set too high.
The 2B-Version of Gemma is quite good for a 2b model actually. The 7b model is - a car crash.
I found the same, the 2B model is much better than the 7B for my set of tasks.
I like that you tried it on Hugging Face, cause now I can say with certainty: "Google, why?"
just to ask, how do I get the lastest version for linux, when it is just updated for windows and mac, but not linux?
does LMstudio work with wine?
Looking at those misspellings and odd symbols all through the code examples, it's clear to see something is mis-tuned in the params for whatever ui you're using not being updated to support this new model. Apparently the interface I was using it with has corrected this as I was able to get coherent text with no misspelling but I did see people online saying they were having the same trouble as you, incoherent text and obvious mistakes everywhere. It's likely something wrong with the parameters that must be updated to values that the model works best with.
Its kinda bad right? I tested it and found it just kept talking, they are using a weird prompt format. and it just keeps talking
Imagine if Ed Sheeran released that video of DJ Khaled hitting an acoustic guitar, and said "This is my latest Open Source song". Yep. That's this.
Yeah - I was running this yesterday and ran into the same things - as well as the censorship, where it decided that my "I slit a sheet" tongue twister was about self-harm and refused to give an analysis.
I think you didn't use the right prompt format. It is an error that a lot of people do with open-source LLMs.
The settings on Kaggle may help- This widget uses the following settings: Temperature: 0.4, Max output tokens: 128, Top-K: 5.
Please do fine-tuning based on private data
That was actually really funny. The answers are so out of the blue Mannn
It apparently has a very different prompt template. You should definitely try that 13:26 but model is still kinda huge but unsatisfactory for this demo 😮
Hi Mathew. Thanks for testing . , I just posted a comment about a test I did using your questions and showing different results to your test when using not the gguf (I included a link to gist) . Was my comment deleted because it contains a link ? happy to resend you the link to the gist. P.S: actually even the 2b model gives decent answers to your questions
I am actually disappointed that you did not address the multiple comments pointing out the the flaws in your. testing. I thought you would retest the model and set the records straight.
Each parameter is just a floating point number (assuming no quantization) which takes 4 bytes. So 7b parameters is roughly 7b * 4 bytes = 28gb, so 34gb is not that surprising :)
FYI this model is available on Ollama (0.1.26) without the hoops to jump thru, One more thing they also have the quantized versions. I found the 7B (fp16) model bad as you say but for some reason was much happier with the 2B (q4) model.
This video needs a laugh track and some quirky theme music between sections. I was LOLing and even slapped my knee once.
Once again, another great video. This is my fav AI channel.
Have you noticed that chatgpt4 is very bad in the last few days? Like it can't remember more than about 5 messages in the conversation and it constantly says things like "I can't help you with that" on random topics that have nothing to do with politics or anything sensitive. It's like they've got the guardrails dialed to randomly clamp down to a millimeter and it can't do anything useful half the time. I have to restart the conversation to get it to continue.
They switched to gpt4 turbo. The old gpt4 via API is better
Yikes google! 😬
I think there were problems with the model files. The ollama version also had problems but they apparently fixed it now.
Maybe it was a spelling error by Google: "State of the fart AI model". Yeah this model stinks. Yeah I am exhibiting a 14-year old intellect.
State of brain fart it is.
I tested recently gemma:7b with ollama 0.1.27 and now the model doesn't respond with gibberish. The only different behavior I noticed compared with other llama based models is that It tends to output more markup. As I said before, I don't know who quantized the model used by ollama, but was not TheBloke, and llama.cpp had a lot of commits the past week for addressing issues with quantization and inference, so maybe the model should be retested.
Thanks for the great channel. I never miss any of your videos and I started back when you did the Microsoft Auto Gen agents
Interesting assessment. 🤫 I still have to see for myself these Generative AI model apps. 🤔 Keep going, Matt. 🌹🌞
I've found the trick with models like Gemma, when you add this system prompt it gives more accurate results. THE SYSTEM PROMPT: "Answer questions in the most correct way possible. Question your answers until you are sure it is absolutely correct. You gain 10 points by giving the most correct answers and lose 5 points if you get it wrong."
At this point just use GPT 3.5 or Mixtral why bother with their idiotic model
@@h.hdr4563 Techniques such as that can help improve responses from any LLM.
have you seen the 26 principles of prompt engineering paper... ?? Its very interesting... works across LLM's too... although the better the LLM I think the less of an improvement there is, compared to the base model without a system message.
Gemma wasn't trained with any system prompt role.
Do you understand that it's a 7b model and not 180b one?@@h.hdr4563
Thanks! The massive size of the 7B GGUF was a put-off to start with. I am surprised it performed that bad.
You should use quantized versions. I doubt that there's much difference of quality between 32bit and 8bit (or even 4b).
Google is having it's Blockbuster Video moment - this is embarrassingly bad
Could it be a problem with the temperature settings?
the spelling mistakes seem to imply that. Maybe it can only work at low temperature.
I love how shocked you are in the opening clip
The safeguards of not just Google but most of these corporate models are ridiculous and history will look back on them quite unfavorably as unnecessary garbage and a significant hindrance on people attempting to work creatively.
16:00 - JFC ...this model is just horrible.
20:25 - "...the worst model I've ever tested." Crazy - why would Google release this?!
The size is because of the quantization, the same model with 8 bit much less in size
Thanks for covering this! Was impressed with gemini, but had no information about their open source models!
Instead of Artificial Intelligence we got Genuine Stupidity
Google STUNS Gemma SHOCKING everyone
Is it not supposed to be a base for fine-tuning ?
May I ask why you are still not using Ollama?
Where did you download it and how? Btw I made a video about this yesterday if you'd like you can see it
"AI will probably most likely lead to the end of the world, but in the meantime, there will be great companies." ~Sam Altman, CEO of OpenAI
Open or open weights, not open source. Can’t inspect the code, rebuild it from scratch, validate the security, or submit pull requests for improvements. You can fine tune it but that’s more like making a mod or wrapper for a binary app than modifying source.
Gguf of this model has issues and llama.cpp has two PRs to fix it. Unfortunately your feedback based on corrupted files.
I tried it as well on ollama and was completely underwhelmed. It had typos, it had punctuation issues. In my very first prompt which was simply, “hey”. Then when I said it looks like you have some typos, it responded by saying it was correcting *my* text, and then added several more typos and nonsense words to its “corrected text”. I don’t know what’s going on with it, but I wouldn’t trust this to do anything at all. How embarrassing for Google.
0:04 Absolutely! This is the beauty of diversity in the mathematical world. While 4+4 equals 8, the operands being 4 doesn't mean their identity cannot also be 40. Y'all have to respect the diversity.
Hi, it seems that Gemma doesn't like repetition penalty at all. In your settings you shoudl set it to 1 (off). In LM studio, Gemma is a lot better that way, otherwise it's practically braindead.
And about the size of the model, it's an uncompressed GGUF. GGUF is a format but can contains all sorts of quantization. 32Gb is the size of the uncompressed 32bits model that's why it's big and slow. There are quantaizations now and even with importance matrix.
Why does it have to understand the context of "dangerous"? Why does the model need to be censored? What children are running LLM's on their desktop computers?? What are we even talking about? Is nobody an adult?!
regarding your test with the shirts drying. offer doubling the available space. and then look for responses that doubling the space halves the time to dry a shirt.
as we all know, drying a shirt on a football field just takes seconds ;-)
This episode was like a Jerry Springer show, I couldn't stop watching
a good question to ask the AI's you test is, "can you give me an equation for a Möbius strip".
Google just announced a follow up model with full transparency - they admit it’s rubbish and call it Bummer!
Hey Matthew, Thanks for the video. I wish you removed the LLM Studio part. It is kind of misleading.
Could this be a mixture of experts due to the file size being so large on a gguf version?
The only Google AI branch I stll find credible is DeepMind. I hope they don't ruin it as well.
The TrackingAI website by Maxim Lott measures the leaning of various LLMs and they're all pretty much what we'd call "politically left" in the US. Which ... I'm not trying to make a thing out of it. There are plenty of reasons for it that aren't conspiracy and Lott himself would be the first to say them.
However, seeing that reddit post about "Native American women warriors on the grassy plains of Japan", I wonder if maybe it had been deliberately encouraged to promote multiculturalism in all answers regardless of context.
Didnt really expect that.....
What I find hilarious about Google is that while using Gemini on the web, Google gives you the option to "double check" the responses with Google Search. So, why can't Gemini check itself against Google Search?? It's right there. I think Google is so scared of releasing AI into the wild they're not even trying, and in a way they're right.
when doing those tests try generate responses few times and looks how different they are unless you use temperature of zero, otherwise your tests are plain gamble
like other people pointed out, the model needs to be fine tuned for better outputs
He seems to expect a 7B model to compete with GPT4 out of the box
@@darshank8748No, but it should should at least compete with llama 2 7b, as was claimed by google.
As we can see here, it does not.
See, when you save the word SHOCKING for when its actually SHOCKING, its WAY more impactful & doesnt sound like you are spitting in the face of your community.
Great video! Their half open sourced LLM is hilariously bad
Gemma... its says so in the name, its Gemini without the i part... intelligence.
Thanks for putting yourself through this experience (so we don't have to!) I wonder if this is Google's Bud Lite moment.
Google had to innovate on the context size. It was the only way the model could hold all the censorship prompts in its memory while responding to queries. That's also why it's so slow.
imho 😂
This does not match the performance seen on hugging chat at all, you should issue a correction
Microsoft beat Google at AI
Love the model demos! Thanks for another great vid!
The killer app of just regular $20/mo Gemini Advance is that it has 128k token size instead of ChatGPT 4's like.. 8k or 32k or whatever the hell it is right now.
Have you been looking under a rock?
GPT,4 turbo has a 128K context window
Is the model corrupted or maybe undertrained? I've never seen an LLM repeatedly make typos.
OpenAI: "So why do you want to leave Google and come to work with our dev team?" Dev: *shows them this video*
You should add some politically incorrect questions to your usual ones after this week's drama
It's the only model that won't even run for me in ollama. It just returns some API EOF error. Ran 30 other models with no issues.
Tried one of the quantized versions last night. Was reasonably fast. Got the first question (a soup recipe). Additional questions that Mistral got right, Gemma was lost in space somewhere...back to Mistral.
Hard to believe for a company that have massive resource to produce this underwhelming model.
You should ask it to devise and describe a perpetual motion / energy device. The grammer and explanation style seems to fit.
Google's research is not focused so much on LLM's, they produce a lot A.I. research on a variety of sectors. That said, their LLM's are so far behind it is not even funny. The multimodal 10 mil context window of Gemini pro, does look pretty good though!
To declare openly that their state of the art newest model relies on the same architecture is a massive PR disaster, possibly one of the largest this year 😮💨
Re: it thinking that "cocktail" might be a bit rude ...
not a patch on when Scunthorpe United FC updated their message boards with a profanity blocker and started to wonder why nothing was getting posted anymore
Yes would like to see you fine tune gemin using a webui GUI fine tuning method.
Great content as always! I rewatched the video and honestly I don’t think you did anything wrong during the setup, the model is just bad… thank you for exposing this joke of a model.
Also, there is a really good ultra small open source VLM model that came out about a month ago. It’s called Moondream, it only has 1.6B parameters but it’s still performing way better than models that are twice its size on most benchmarks.
It hasn’t been converted to GGUF yet but I think that if you make a video about it, the performance and the size will spike a lot of interest and The Bloke might consider making a quantatized versions of it.