Thanks for the waning. Makes sense to avoid being slowly squeezed out. From MacRumors: "Gurman reports that Apple is planning to unveil a redesigned Mac mini with both M4 and M4 Pro chip options at its October event, with a launch to follow in early November.. ... To accommodate Apple Intelligence, the new Mac mini models may include 16GB of Unified Memory as standard, rather than 8GB."
@@davidconnelly It's more expensive buying Apple, but their M4 chip with 128GB RAM or better should do the trick. I'm looking forward to having something like that chugging all day on AI tasks, but the hard part is the framework and software. Which models to run, etc.
Well, it's not like I've been doing nothing. I've built the best framework in the world. I see that you've been here for fifteen years. What have you done? Remind me.
@@davidconnelly I'm just happy to see you pop up on my recommendation list again. I used to join your forums way way way back, like radio podcasts and all the good stuff. Good fortune. :D
If I appeared on anyone's recommendation list I'm shocked. I had assumed that TH-cam had cancelled me. Anyway, forgive my initial response. I hope you have a good year.
I ran Codestral 22b on an old NVIDIA workstation GPU with 16 GB VRAM, it allowed running 4.5 till 5.5 bpw quantized at 10~15 tokens/sec. One rule to remember, memory bandwidth (VRAM preferably) is key, lots of VRAM obviously as well. Forget about CPU and RAM. Second choice would be Apple devices with fast unified RAM, but mind that nothing is better than an NVIDIA GPU, look up the memory bandwidth speeds. I think Apple's very expensive M3 Ultra comes close to NVIDIA GPU's in terms of performance.
Mad….I’ve been telling everyone since last year that I’m waiting for a Mac Mini M4……and I’m gonna buy the top spec model too! Your video came to me at the right time just now! 😂
i have computer with geforce 4090 which cost about $2500.00. It runs local models great. I was even running LLM and image diffusion model at the same time yesterday.
@@davidconnelly I'm running an AMD 5950x w/ 128GB RAM and it is slow with the larger LLM models (70B), but runs and doesn't crash. Smaller models run fast though, and really I don't even use a GPU. Linux. It's ridiculous paying NVidia IMO because 1) they do not have enough onboard RAM to run large models, 2) you cannot run on linux without tainting the kernel with their drivers (I think they are finally changing that policy though), and 3) it's too expensive compared with other TPU solutions that are meant to run LLMs better. Even an ARM chip like the M4 is better at AI than the x86 chips. and 4) people forgot that Nvidia has always just been a gamer GPU graphics card maker.
Do you not think you'd do better to wait for the m4 Mac Studio. Presumably, the bottom end Studio will have a better SOC and more RAM than the top end Mini for only a slightly higher price.
Going to Mac for price / performance (FLOPS/watt) is not very wise tbh (let alone considering the laptop form factor). Plus Apple is the paradigm of enshittification and programmed obsolescence. Also the dichotomy between a 30k beefy computer and a M4 is false. You can play in a 1k - 30k continuum. Eg a 5k computer today (2x 4090gpu) will likely be far more capable than a M4 in many senses for running LLMs (peak performance/power, price, extensibility, modularity...). Also, try first small LLMs fitting into that M4, to see if they are actually useful for your tasks, otherwise you may risk ending up with a very expensive autocomplete.
I dunno where you are getting your info man. I bought a maxed out Macbook Pro in 2015, and served me until LAST YEAR. Reliably. Faithfully. Snappy. I now have a maxed out Macbook Pro from 2023 with the M3Max cpu, and it is literally insane how powerful it is, and I see this thing lasting me again, a minimum of 6 years easily, while being great at what it does. Sure I COULD buy other hardware, but 1. I get a shitty OS from Microsoft, or I deal with Linux and a billion different distros of that with headaches abound. 2. a mishmash of different hardware and the stability issues that result from that. Poor efficiency and the like to boot. and lastly, 3. resale value is trash. Right now I can still sell my 2015 Macbook Pro and get a solid $300 for it, due to the spec's and supplied hardware with it.
@@shmuck66 well, I have a HP laptop from 2013 working just fine. Installed ubuntu LTS a few times. I replaced its battery and duplicated its RAM (16GB) for around 50€. For mac people pay 200€ for just 8GB. Plenty of evidence out there how ridiculous the Mac pricing is, not going to repeat that. Had an iphone, ipad and know people having macs, and again, we learnt the hard way about their nefarious update practices and how devices enshittify over time. There's plenty of evidence out there about the planned obsolescence practices from Apple, I think there is even a veritasium video about that.
Hi DC thanks for the great video and what is happening with the AI services. Running it locally is the way to go, a couple of months ago I decided to get a PC for doing this I bought a Dell desktop which is a couple of years old which can run dual xeon processors. It came with 128 gigs of RAM. When I first run it with LM Studio and some 7 GB size models it was running just as fast as ChatGPT. I then purchased a second-hand NVIDIA Tesla k80 graphics card, and to be honest I have noticed any difference. That set up cost me under £500.
Well... looks like I'm not getting out of this one without an insult. So, let me help you. I agree. I'm terrible. Oh, I'm so ghastly! What a bad bad bad bad person. Even every time a exhale, it's just more CO2 for the Earth. Don't know about you but I think we've had quite enough of that already. Oh, what a rotten person. There you go. I hope that helped you to feel better.
@davidconnelly aside from the hours spent... did you know that chat LLMs are stateless? Every time you send another message, the entire message history is sent to the server, and the LLM processes it from the beginning, and adds a single token to the end, and it repeats until "done". All LLMs have a max context size, so eventually it loses the beginning of your convo or gives you a message about it. If you're using the API directly, those long conversations grow in cost exponentially. If you're paying a subscription, they are eating the cost of long conversations filling the context window.
Holy cow! I didn't! That explains a great deal. If you're right that means that if you're building a large app then it's worth 'resetting' the conversation frequently. Are you absolutely sure about that?
You're right. It's confirmed. Thanks for that. Here's what Chat GPT says: Yes, the user is correct in their explanation of how Large Language Models (LLMs), like OpenAI's and Claude AI, typically function. Let me break it down further: Statelessness of LLMs: LLMs are generally stateless. This means they don’t "remember" previous conversations or interactions inherently. Every time you send a message, the model doesn't have memory of past conversations unless that history is provided again in the input. That's why the message history is included with each new message for context. This gives the impression that the model "remembers" previous messages, but in reality, it processes everything from scratch with each interaction. Max context size: LLMs have a maximum context window (the amount of text they can handle at once). For example, GPT-4 models can handle several thousand tokens (words and punctuation) in one go, but there's an upper limit. When the conversation exceeds this limit, the model will start "forgetting" the oldest parts of the conversation, truncating them. Cost: When using the API directly, longer conversations can become more expensive because the model needs to process more tokens. Every token (piece of text) sent as part of the context, as well as each token generated in response, contributes to the cost. As the history grows, the number of tokens processed increases, hence the cost rises. Subscription plans: If you are using a subscription service, the cost of long conversations might be absorbed by the provider (like OpenAI), but it’s still a factor that affects operational costs on their side. They manage this cost by either increasing subscription prices or implementing limitations on the usage. So, overall, the comment is accurate in explaining the stateless nature of LLMs, the constraints of the context window, and how longer conversations can impact cost.
@davidconnelly haha yep, I am working in the industry now and the more you know the more smoke and mirrors it all appears. I think it all sucks for different reasons :)
Well, let me tell you in no uncertain terms - I am very clueless when it comes to hardware. Everything I know about hardware is just based off of speculation and things I've read from a few search results. So, no argument from me. Not today. Can I just ask an unrelated question? I'm looking for somebody who has a very special skillset involving hardware and software. I help people and I believe in the power of gut instinct. Something tells me you either have that kind of expertise, or you know somebody who does. Do you understand what I'm asking you and am I right? Please do feel free to answer in binary.
@@davidconnelly Yes I will elaborate my earlier point. You develop user interfaces, but have say you have no clue about the hardware the interfaces are used on? That really troubles me. In my world that's not very competent and would lead to heaps and stacks of performance problems. But yes. I've been a web developer and web designer since 2002. I've done 1.001 different things in IT from network configurations, database deployment, programming from C++ to javascript, AI, marketing, SEO, video and graphics.
Thanks for the waning. Makes sense to avoid being slowly squeezed out.
From MacRumors:
"Gurman reports that Apple is planning to unveil a redesigned Mac mini with both M4 and M4 Pro chip options at its October event, with a launch to follow in early November.. ... To accommodate Apple Intelligence, the new Mac mini models may include 16GB of Unified Memory as standard, rather than 8GB."
Thank you. I don't have any inside info at all when it comes to Apple and I must stress, I'm not a hardware expert. Much gratitude to you. Cheers!
@@davidconnelly It's more expensive buying Apple, but their M4 chip with 128GB RAM or better should do the trick. I'm looking forward to having something like that chugging all day on AI tasks, but the hard part is the framework and software. Which models to run, etc.
David my friend you still alive! Man, you've been at it for more than a decade now!
Well, it's not like I've been doing nothing. I've built the best framework in the world. I see that you've been here for fifteen years. What have you done? Remind me.
@@davidconnelly I'm just happy to see you pop up on my recommendation list again. I used to join your forums way way way back, like radio podcasts and all the good stuff. Good fortune. :D
If I appeared on anyone's recommendation list I'm shocked. I had assumed that TH-cam had cancelled me. Anyway, forgive my initial response. I hope you have a good year.
I ran Codestral 22b on an old NVIDIA workstation GPU with 16 GB VRAM, it allowed running 4.5 till 5.5 bpw quantized at 10~15 tokens/sec. One rule to remember, memory bandwidth (VRAM preferably) is key, lots of VRAM obviously as well. Forget about CPU and RAM. Second choice would be Apple devices with fast unified RAM, but mind that nothing is better than an NVIDIA GPU, look up the memory bandwidth speeds. I think Apple's very expensive M3 Ultra comes close to NVIDIA GPU's in terms of performance.
This is the classic puppy dog sale. Get em hooked then charge them high.
I'd like to know how people are getting hooked on the AI though. What are they using? I can always run my own model locally.
Aren't graphic cards better for running these models?
Mad….I’ve been telling everyone since last year that I’m waiting for a Mac Mini M4……and I’m gonna buy the top spec model too! Your video came to me at the right time just now! 😂
i have computer with geforce 4090 which cost about $2500.00. It runs local models great. I was even running LLM and image diffusion model at the same time yesterday.
What operating system do you use?
@@davidconnelly I'm running an AMD 5950x w/ 128GB RAM and it is slow with the larger LLM models (70B), but runs and doesn't crash. Smaller models run fast though, and really I don't even use a GPU. Linux. It's ridiculous paying NVidia IMO because 1) they do not have enough onboard RAM to run large models, 2) you cannot run on linux without tainting the kernel with their drivers (I think they are finally changing that policy though), and 3) it's too expensive compared with other TPU solutions that are meant to run LLMs better. Even an ARM chip like the M4 is better at AI than the x86 chips. and 4) people forgot that Nvidia has always just been a gamer GPU graphics card maker.
We will own you and you will be happy!
Do you not think you'd do better to wait for the m4 Mac Studio. Presumably, the bottom end Studio will have a better SOC and more RAM than the top end Mini for only a slightly higher price.
alive
This is very interesting. Do you have a paid subscription?
Going to Mac for price / performance (FLOPS/watt) is not very wise tbh (let alone considering the laptop form factor). Plus Apple is the paradigm of enshittification and programmed obsolescence. Also the dichotomy between a 30k beefy computer and a M4 is false. You can play in a 1k - 30k continuum. Eg a 5k computer today (2x 4090gpu) will likely be far more capable than a M4 in many senses for running LLMs (peak performance/power, price, extensibility, modularity...). Also, try first small LLMs fitting into that M4, to see if they are actually useful for your tasks, otherwise you may risk ending up with a very expensive autocomplete.
I dunno where you are getting your info man.
I bought a maxed out Macbook Pro in 2015, and served me until LAST YEAR. Reliably. Faithfully. Snappy.
I now have a maxed out Macbook Pro from 2023 with the M3Max cpu, and it is literally insane how powerful it is, and I see this thing lasting me again, a minimum of 6 years easily, while being great at what it does.
Sure I COULD buy other hardware, but 1. I get a shitty OS from Microsoft, or I deal with Linux and a billion different distros of that with headaches abound. 2. a mishmash of different hardware and the stability issues that result from that. Poor efficiency and the like to boot. and lastly, 3. resale value is trash.
Right now I can still sell my 2015 Macbook Pro and get a solid $300 for it, due to the spec's and supplied hardware with it.
@@shmuck66 well, I have a HP laptop from 2013 working just fine. Installed ubuntu LTS a few times. I replaced its battery and duplicated its RAM (16GB) for around 50€. For mac people pay 200€ for just 8GB. Plenty of evidence out there how ridiculous the Mac pricing is, not going to repeat that. Had an iphone, ipad and know people having macs, and again, we learnt the hard way about their nefarious update practices and how devices enshittify over time. There's plenty of evidence out there about the planned obsolescence practices from Apple, I think there is even a veritasium video about that.
What practical use is "AI" anyway? I use it for translating but its not something I'd pay for.
They can write a game of Snake using Python that runs OK after a few attempts. Apart from that, they're not good for much.
Interfaces, being able to talk in natural English to control things. Or Scottish if you've got the extra hardware.
Hi DC thanks for the great video and what is happening with the AI services. Running it locally is the way to go, a couple of months ago I decided to get a PC for doing this I bought a Dell desktop which is a couple of years old which can run dual xeon processors. It came with 128 gigs of RAM. When I first run it with LM Studio and some 7 GB size models it was running just as fast as ChatGPT. I then purchased a second-hand NVIDIA Tesla k80 graphics card, and to be honest I have noticed any difference. That set up cost me under £500.
Ads are coming to AI
My rig only cost me $25,000. =P, not $30,000.
I'm alive
God bless the child who has his own 😂
Very old mic, I also own one. Great sound. Great accent. Scottish? 🙂
Yes I am!
You're chatting with AI for hours? You don't seem to understand context size either.
Well... looks like I'm not getting out of this one without an insult. So, let me help you. I agree. I'm terrible. Oh, I'm so ghastly! What a bad bad bad bad person. Even every time a exhale, it's just more CO2 for the Earth. Don't know about you but I think we've had quite enough of that already. Oh, what a rotten person. There you go. I hope that helped you to feel better.
@davidconnelly aside from the hours spent... did you know that chat LLMs are stateless? Every time you send another message, the entire message history is sent to the server, and the LLM processes it from the beginning, and adds a single token to the end, and it repeats until "done". All LLMs have a max context size, so eventually it loses the beginning of your convo or gives you a message about it.
If you're using the API directly, those long conversations grow in cost exponentially. If you're paying a subscription, they are eating the cost of long conversations filling the context window.
Holy cow! I didn't! That explains a great deal. If you're right that means that if you're building a large app then it's worth 'resetting' the conversation frequently. Are you absolutely sure about that?
You're right. It's confirmed. Thanks for that. Here's what Chat GPT says: Yes, the user is correct in their explanation of how Large Language Models (LLMs), like OpenAI's and Claude AI, typically function. Let me break it down further:
Statelessness of LLMs: LLMs are generally stateless. This means they don’t "remember" previous conversations or interactions inherently. Every time you send a message, the model doesn't have memory of past conversations unless that history is provided again in the input. That's why the message history is included with each new message for context. This gives the impression that the model "remembers" previous messages, but in reality, it processes everything from scratch with each interaction.
Max context size: LLMs have a maximum context window (the amount of text they can handle at once). For example, GPT-4 models can handle several thousand tokens (words and punctuation) in one go, but there's an upper limit. When the conversation exceeds this limit, the model will start "forgetting" the oldest parts of the conversation, truncating them.
Cost: When using the API directly, longer conversations can become more expensive because the model needs to process more tokens. Every token (piece of text) sent as part of the context, as well as each token generated in response, contributes to the cost. As the history grows, the number of tokens processed increases, hence the cost rises.
Subscription plans: If you are using a subscription service, the cost of long conversations might be absorbed by the provider (like OpenAI), but it’s still a factor that affects operational costs on their side. They manage this cost by either increasing subscription prices or implementing limitations on the usage.
So, overall, the comment is accurate in explaining the stateless nature of LLMs, the constraints of the context window, and how longer conversations can impact cost.
@davidconnelly haha yep, I am working in the industry now and the more you know the more smoke and mirrors it all appears. I think it all sucks for different reasons :)
LOL. A web developer who thinks a Mac mini is anything but weak, is not a good web developer.
Well, let me tell you in no uncertain terms - I am very clueless when it comes to hardware. Everything I know about hardware is just based off of speculation and things I've read from a few search results. So, no argument from me. Not today. Can I just ask an unrelated question? I'm looking for somebody who has a very special skillset involving hardware and software. I help people and I believe in the power of gut instinct. Something tells me you either have that kind of expertise, or you know somebody who does. Do you understand what I'm asking you and am I right? Please do feel free to answer in binary.
?? What? Web dev and hardware have nothing in common.
Web developers don’ even know what a page of memory is.
@@spoonikle Then you are a bad web developer. Even worse, you don't even know how little you know ...
@@davidconnelly Yes
I will elaborate my earlier point. You develop user interfaces, but have say you have no clue about the hardware the interfaces are used on? That really troubles me. In my world that's not very competent and would lead to heaps and stacks of performance problems. But yes. I've been a web developer and web designer since 2002. I've done 1.001 different things in IT from network configurations, database deployment, programming from C++ to javascript, AI, marketing, SEO, video and graphics.