I gave three AI models a CSS quiz

Kevin Powell

มุมมอง 7 648

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 18 ก.ย. 2024

ความคิดเห็น • 64

@mkLee970 วันที่ผ่านมา ⁺⁸
On Question 6 ~17:40 seconds in Claude made up an answer. It select "B) 1.5rem + 1.5vmin" but in the question answer B is "B) 1.5rem + 1vmin" . They are not the same. The AI made up a new answer how fun is that.
@FenrirRobu วันที่ผ่านมา
It's even funnier because AI then goes on to explain that answer B is incorrect because it is 1.5rem + 1vmin
@tgd-y3x วันที่ผ่านมา ⁺⁵
Great Video. I NEVER trust AI for coding questions. I do use them to point me in a direction where I can search elsewhere to find an answer to a tough question, but I don't ever trust exactly what they give me.
@tgd-y3x 7 ชั่วโมงที่ผ่านมา
... and cheers from Ottawa, from a NS Acadian!
@only_._gaming วันที่ผ่านมา ⁺¹²
This is a great video, and can be made into a series of sort.
@CodingwithNephi-c6r วันที่ผ่านมา ⁺⁹
You accidentally gave a point to Copilot instead of Gemini on question 9 :)
@myartikool วันที่ผ่านมา ⁺²
Judging by the overall performance of both, I don't think that matters that much :}
@HITO-nv4cg วันที่ผ่านมา ⁺¹
The best free AD for claud 😃
@chriswalker4636 วันที่ผ่านมา ⁺²
Kevin thank you for producing the very educational content you publish. I have been a subscriber for sometime but this is my first comment. I am very interested in learning more relative to large design models (LDMs). Your expertise in reviewing LDMs would be very interesting and worthwhile.🎉
@kingoffongpei วันที่ผ่านมา
I haven't tried any of these other AIs, but I used llama on a whim to help me learn D3.js and it was immensely helpful. The docs were really hard to find information with and not very good at breaking down how it worked. Asking llama (idr if it was 70B or 450B) questions about things I just couldn't figure out really helped take some of the pain out of the process. I could ask questions about the answers it provided and it felt a lot like a conversation with someone who knew and understood D3. There were a few instances where the solutions it provided didn't seem to work but when I pointed it out, it did fix it. I thought that was pretty understandable since I asked it questions about a project that it never saw, only basing it on the context I provided in my prompt, and D3 has changed a lot over the years so it may have based some answers on stuff on the internet written about previous versions.
@clevermissfox วันที่ผ่านมา ⁺¹
Yay, KPow had mentioned this video on his discord awhile ago and I’ve been [im]patiently waiting 😂 loving the long format again and this is a very interesting comparison! I’ve experimented with copilot, chatGPT and Claude in terms of JavaScript refactoring not as much css and in my tests Claude came out on top; copilot and chatgpt was dismal and laughable. To be fair , a lot of this was in February or March and these things change and learn so quickly , the same experiments would most likely glean very different results now in September 😂
@321sas 20 ชั่วโมงที่ผ่านมา ⁺¹
At 7:45 you got the question wrong. You said, "as long as my selector has equal specificity, it will not work" but the text said "As long as my selector has equal specificity, it will work." That's why Gemini did it so terribly and flipped the specificity because you told it something that was not true and it thought that equal specificity was needed.
@DampeS8N วันที่ผ่านมา ⁺¹²
At about 3 minutes you use the phrase "Maybe it knows how it works but got the explanation wrong." This is a mental mistake. LLMs don't know how anything _works._ That isn't how the function. Your earlier assessment that they regurgitate the general consensus of the internet is more correct. But even then, that's not how they work. They are big autocompletes. They merely predict the most likely next words, right or wrong. Nonsense or sense. And they are not consistent.
Every time I see people testing LLMs like you are, I see them asking _once_ and that's not how these tests should work. That's not how you should use LLMs, either. You should be asking multiple times because it WILL give different answers each time.
@Killyspudful วันที่ผ่านมา ⁺²
Absolutely agree.
@AntiAtheismIsUnstoppable 9 ชั่วโมงที่ผ่านมา ⁺¹
I always hate it when people use words like _intelligence_ and _think_ about AIs. They're advanced calculators, nothing else. The problem here seems to be, that because they're too advanced algorithms to understand for most humans, then it must be conscienct. No. A calculator is a calculator, no matter if it solves 2+2 or advanced algerbra. I could also say, well it is magic to me how the calculator calculates 2+2 and gets 4, therefore the calculator must be conscient.
@daveturnbull7221 6 ชั่วโมงที่ผ่านมา
Vertical Media is defined by wikipedia as Trade Magazines/Journals.
@joshuamitchell6204 5 ชั่วโมงที่ผ่านมา
Custom defined units would be kinda cool...didn't know I needed that until now 😂
@Deadgray 3 ชั่วโมงที่ผ่านมา
Great stuff. I use it most often as a syntax reminder or template generator for programming. Although sometimes they work okay, sometimes when I use GPT or Copilot they either don't understand the question at all or have to be guided in the right direction. IMO in the case of css these differences result from when the model training was completed. For example, you can see a big difference between gpt4 and 4o. By the way, it's time for me to check Claude.
@samhenrigold 18 ชั่วโมงที่ผ่านมา
The funny thing with these models is that they’re always, like, two years behind on web technologies. Like I cannot for the life of me get any LLM to touch subgrid with a 10 foot pole
@anthonybarnes วันที่ผ่านมา
See this is a great topic to make a video about - hitting multiple relevant modern technologies at once 👏
@Nova_BG วันที่ผ่านมา ⁺¹
Always Amazing Videos !
@viccc.n 13 ชั่วโมงที่ผ่านมา
You should try with OpenAI o1, it takes more time for "thinking" before answering
@weisj วันที่ผ่านมา
Gemini got the first questions wrong. It says that the rem in the media query would be relative to the html font size. In particular it says that 1rem would be 32px in this case.
@SkamanSamTyler 23 ชั่วโมงที่ผ่านมา
I really like this! I keep telling my friends not to trust the AI stuff, but some of them insist on "learning" by asking the LLMs. I have always preferreds the source materials - W3C and MDN are still the greatest resources on the web. That said, I wonder how Codeium stacks up? (I do use AI assistants in my edditors, but I always double-triple check their code. Remember: never use someone else's code without understanding yourself!)
@KevinPowell 21 ชั่วโมงที่ผ่านมา
Codium uses GPT-4o, which is the same as copilot. Not sure if the paid version uses something else though
@nustaniel 20 ชั่วโมงที่ผ่านมา
Oh the struggles of trying to get LLMs to do anything correct in CSS. Not only is it terrible at CSS shapes and such, which is basically just math, but it keeps using rgba(0, 0, 0, 0.5) instead of just rgb(0 0 0 / 0.5) and other outdated approaches to things. I have completely given up on asking an LLM for CSS things outside of "Is there a way to do.." and using the response to figure it out myself.
@compton8301 วันที่ผ่านมา
Wow, your knowledge is remarkable!
@PeterWarholm 3 ชั่วโมงที่ผ่านมา
Great video Kevin! Recently read about Claude 3.5 Sonnet had shipped with a lot of coding (and math) in mind. Would love to the a rematch when/if the others make an update. Regarding Q, afaik it is not really part of the metric system but seems to be a unique unit for the web?
@mendoso 5 ชั่วโมงที่ผ่านมา
A lot of these fails come from not being strict enough, which in normal case would be correct approach. But in the world of software development one should differ a typo from some weird token which can be a css unit or a part of new specs. Funny how they can pretend that 1vm is.the same as 1.5vm but at the same tiime do not even bother to ignore case when it comes to the unit of q/Q. You could probably require them to perform strict syntax checking but then you should also avoid errors like omitting NOT in the Wordpress question.
@xilliman วันที่ผ่านมา ⁺³
I love AI but when it comes to coding, it has lots of flaws.
I just asked ChatGPT about how to display a unit inside an input but it got it absolutely wrong. Even after I told ChatGPT, that it was wrong, the answer was still wrong :)
It’s great for asking short questions like: what’s the + selector doing?
The question would be: can an AI (in the future) understand new CSS features and apply them correctly?
@KevinPowell วันที่ผ่านมา ⁺⁵
The problem with CSS specificially is that nothing really lives in isolation. Context is key, but even giving it access to your entire project, I don't see a time where it'll properly infer all the different things that are going on.
Like you said, simple things it might be able to explain, but I mean, two of the three had no idea how specificity of simple selectors worked, so I have my doubts there as well, lol.
@andreilucasgoncalves1416 วันที่ผ่านมา ⁺¹
@@KevinPowellYeah, and because of that it makes LLMs be really bad with CSS in general, but a little bit good with Tailwind because it is more isolated
@denisds130 วันที่ผ่านมา
I'm curious how Perplexity would do in this challenge.
@uiuxaidesign วันที่ผ่านมา
love this, more AI content please!
@TheThirdWorldCitizen วันที่ผ่านมา
Would the media query behavior change when using nesting?
@nomadshiba วันที่ผ่านมา
also html isnt the root of everything, its the document
instead of html i could have just used svg or mathml
@darwinmanalo5436 วันที่ผ่านมา
Claude it is. ❤ Can you publish those questions so we can try it?
@scragar 20 ชั่วโมงที่ผ่านมา
17:40
Not sure if you should count that as correct.
It said B, which was wrong, but it also rewrote the answer text to be correct(adding .5 to the vmin).
IMO that should be half a point, inventing an answer not in the options is basically cheating and if it wasn't for you knowing the answers going into the test it could very easily have convinced you D was wrong and B was right using the explanation as to why D was right and B was wrong.
@albedesigns วันที่ผ่านมา
I have never clicked on a notification so fast 😂 Great topic for a video!
@KevinPowell วันที่ผ่านมา
Glad to hear that! Was very curious if people would be into this type of thing or not!
@kaslmineer7999 วันที่ผ่านมา
Oh i clciked on the video after 1 min of its publish
@kaslmineer7999 วันที่ผ่านมา
Cool kevin powell gave me a heart on my comment :)
@icepuddin168 23 ชั่วโมงที่ผ่านมา
what font is he using at 00:03 ??
@YacineBougera 43 นาทีที่ผ่านมา
Kevin I found a good idea in a website that u would be happy to create it and show the way it is done to everyone ...please I need ur help
@DxBang3D วันที่ผ่านมา ⁺⁵
!important is !correct... in most programming languages, putting an exclamation mark in front makes it a NOT operator...
@daedaluxe 22 ชั่วโมงที่ผ่านมา ⁺¹
I can't tell if you're trying to tell us that important isn't correct or isn't incorrect
@DxBang3D 22 ชั่วโมงที่ผ่านมา
@@daedaluxe It is really !important what I am trying to say. ;)
@daedaluxe 21 ชั่วโมงที่ผ่านมา
@@DxBang3D It's not important, got it
@KevinPowell 21 ชั่วโมงที่ผ่านมา
It's one of the several mistakes the working group has listed, but can't change it now 😊
@a1white วันที่ผ่านมา ⁺¹
I'll stick to W3C Schools and Stack Overflow
@webschool4780 วันที่ผ่านมา
4 one
@shyamfx วันที่ผ่านมา
Wow
@5alidshammout วันที่ผ่านมา
discord communities ftw
@ibrahimharchiche3590 วันที่ผ่านมา
I dont have time to watch the video, but i just want to say that ai is terrible at css because it's so visual and implicit unlike programming languages which are based on logic.
@st8113 วันที่ผ่านมา ⁺¹
openai leaping in to invalidate this video with the new model mere days before release
@andreilucasgoncalves1416 วันที่ผ่านมา
O1 probably would get most of them correct
@MakoSDV วันที่ผ่านมา
This is why I don't use these AI chatbots...
@samtastic24 วันที่ผ่านมา ⁺¹
AI is amazing at backend, but not so much at frontend, mostly due to the fact that it has a brain but lacks eyes 🙈
@Dekutard วันที่ผ่านมา
copilot and gemini? why not chatgpt and llama?
@KevinPowell 21 ชั่วโมงที่ผ่านมา
Copilot uses the same model as chat gpt. As for llama, I could have... Maybe next time?
@Dekutard 15 ชั่วโมงที่ผ่านมา
@@KevinPowell i feel like the responses from copilot are different though too, idk what microsoftness they add to it. i would assume raw chatgpt would be more optimal but i could be wrong. and claude! claude has to be a contender too. i never hear about gemini or copilot for coding 🌚 js

ต่อไป

เล่นอัตโนมัติ