PROOF Tesla Is Creating a WORLD MODEL!! What are End-To-End Training, and Foundation World Models?
ฝัง
- เผยแพร่เมื่อ 14 ก.ค. 2023
- Evidence is mounting quickly that Tesla has made a MAJOR move in their Full Self Driving (FSD Beta) technology: they're creating a foundation world model, likely with end-to-end training to solve car and Teslabot autonomy! But there is also a LOT of confusion about what all of these terms mean. Come with me to discover exactly what world models, foundation models, and end-to-end training means, why they are NOT 3D reconstructions, and why they're so important!
I recorded this prior to xai's Twitter Spaces one July 14th, and this provides EVEN MORE evidence that what I'm saying is correct. Stay tuned for a "part two" to this video!
Join this channel to get access to perks:
/ @drknowitallknows
**To become part of our Patreon team, help support the channel, and get awesome perks, check out our Patreon site here: / drknowitallknows . Thanks for your support!
Go to drinkag1.com/drknowitall to get started on your first purchase and receive a FREE 1-year supply of Vitamin D3+K2 and 5 travel packs.
Get 25% off Blinkist premium and enjoy 2 memberships for the price of 1! Start your 7-day free trial by clicking here: blinkist.com/drknowitallknows
Get The Elon Musk Mission (I've got two chapters in it) here:
Paperback: amzn.to/3TQXV9g
Kindle: amzn.to/3U7f7Hr!
**Want some awesome Dr. Know-it-all merch, including the AI STUDENT DRIVER Bumper Sticker? Check out our awesome Merch store: drknowitall.itemorder.com/sale
For a limited time, use the code "Knows2021" to get 20% off your entire order!
**Check out Artimatic: www.artimatic.io
**Want to get in on WeBull's awesome Crypto and stock fun? Check out this link, and get started trading stock and Crypto!
a.webull.com/i/DrKnow-it-allK...
**If you are looking to purchase a new Tesla CAR, Solar roof, Solar tiles or PowerWall, just click this link to get up to $500 off! www.tesla.com/referral/john11286. Thank you!
**You can help support this channel with one click! We have an Amazon Affiliate link in several countries. If you click the link for your country, anything you buy from Amazon in the next several hours gives us a small commission, and costs you nothing. Thank you!
* USA: amzn.to/39n5mPH
* Germany: amzn.to/2XbdxJi
* United Kingdom: amzn.to/3hGlzTR
* France: amzn.to/2KRAwXh
* Spain: amzn.to/3hJYYFV
**What do we use to shoot our videos?
-Sony alpha a7 III: amzn.to/3czV2XJ
--and lens: amzn.to/3aujOqE
-Feelworld portable field monitor: amzn.to/38yf2ah
-Neewer compact desk tripod: amzn.to/3l8yrUk
-Glidegear teleprompter: amzn.to/3rJeFkP
-Neewer dimmable LED lights: amzn.to/3qAg3oF
-Rode Wireless Go II Lavalier microphones: amzn.to/3eC9jUZ
-Rode NT USB+ Studio Microphone: amzn.to/3U65Q3w
-Focusrite Scarlette 2i2 audio interface: amzn.to/3l8vqDu
-Studio soundproofing tiles: amzn.to/3rFUtQU
-Sony MDR-7506 Professional Headphones: amzn.to/2OoDdBd
-Apple M1 Max Studio: amzn.to/3GfxPYY
-Apple M1 MacBook Pro: amzn.to/3wPYV1D
-Docking Station for MacBook: amzn.to/3yIhc1S
-Philips Brilliance 4K Docking Monitor: amzn.to/3xwSKAb
-Sabrent 8TB SSD drive: amzn.to/3rhSxQM
-DJI Mavic Mini Drone: amzn.to/2OnHCEw
-GoPro Hero 9 Black action camera: amzn.to/3vgVMrH
-GoPro Max 360 camera: amzn.to/3nORGYk
-Tesla phone mount: amzn.to/3U92fl9
-Suction car mount for camera: amzn.to/3tcUfRK
-Extender Rod for car mount camera: amzn.to/3wHQXsw
**Here are a few products we've found really fun and/or useful:
-NeoCharge Dryer/EV charger splitter: amzn.to/39UcKWx
-Lift pucks for your Tesla: amzn.to/3vJF3iB
-Emergency tire fill and repair kit: amzn.to/3vMkL8d
-CO2 Monitor: amzn.to/3PsQRh2
-Camping mattress for your Tesla model S/3/X/Y: amzn.to/3m7ffef
**Music by Zenlee. Check out his amazing music on instagram -@zenlee_music
or TH-cam - / @zenlee_music
Tesla Stock: TSLA
**EVANNEX
Check out the Evannex web site: evannex.com/
If you use my discount code, KnowsEVs, you get $10 off any order over $100!
**For business inquiries, please email me here: DrKnowItAllKnows@gmail.com
Twitter: / drknowitall16
Also on Twitter: @Tesla_UnPR: / tesla_un
Instagram: @drknowitallknows
**Want some outdoorsy videos? Check out Whole Nuts and Donuts: / @wholenutsanddonuts5741
Sources:
• ONE MODEL to Rule Them...
• Tesla Engineer: Tesla'...
www.sciencedirect.com/science...
ai.stackexchange.com/question... - วิทยาศาสตร์และเทคโนโลยี
Just a word of encouragement to say that I love the geekiness of your videos on AI! You do a great job of taking an esoteric subject and breaking it down into understandable analogies for us "laygeeks." I'm particularly interested in MICS, (massively complex interconnected systems) and am attempting to write a layman's guide on the subject, and find your geek sessions to be highly inspiring and informative. Bravo.
Totally agree
Without formulae and numbers this is nowhere near geeky. I still like it, but do not mix up his explanations with actual geek stuff.
29:47 This is exactly what versions FSD 10 & 11+ have been missing, SELF-LEARNING. It presently encounters a driving scenario, as though it has never seen it before, even if it has been driven through that same scenario for weeks. (The Tesla team has been hand-tweaking the algorithms to compensate.) What will distinguish versions 12 and beyond is this fundamental ability to learn from previous errors, and then predict and avoid those errors in the future. It took some time to get here, but this is huge progress.
Elon Musk way back when talked up Edge Inference. As he should have. Including discussions on chip design the mimics the brain. Perfect for his “visual” approach. The only chip design that is visual, neural at its basic core is FPGA architecture. He has shifted away from this idea (mistake), as part of his mass FSD firing several years ago. Field programmable gateway arrays actually process data into visuals like are eye does. And move those bits in a natural neural method at the silicon level. Very fast. Very brain like. This should be exploited. Makes it so easy to do edge inference (self learning, with communication back to AI father trainer, DoJo). Also, any self learning HAS to be at human eyeball 3D and quality, 360 degrees. That would could make it faster than us humans who have to move our head for the 360 view (and/or use mirrors to augment).
Self learning happens only during training in batch mode on the server. Only inference (run-time) can occur in the car - takes huge compute and time to train. Models are iteratively trained and deployed to the cars with updates.
@@russm8193 That's clear but when FSD is eventually deployed in the future, what does each individual vehicle receive from DOJO and what is it computing? A tokenized subset of the geographical area model in which the vehicle is currently operating, which was gathered by the local vehicle "hive" in the area and compiled by DOJO? This is memory and bandwidth limited, implying that when there is no connectivity, in a new geographic location, the vehicle computation will revert to a simplified general model, or will HW4 or HW5 be able to do this stand-alone? How many adjustable parameters will the vehicle computer be using in order to accomplish the processing? (A modern LLM uses billions of parameters, for example) Encoding of the area model will have to be very efficient, in order to reduce both memory and computation.
As a software engineer who is not in AI but took the Andrew Ng online ML class years ago, I found this extremely helpful and interesting. Things have progressed a long way.
😅😅
Instructions Instructions 😮😅😅and
😅😊 4:25
😅
Thank you for this explanation that actually seems clear to me, although my understanding of most of the details is fuzzy. It seems this World Model is going to learn more closely to how we humans learn to navigate the world. Your ability to clarify dense information is amazing. Thank you!
Yes, very clear explanations. So clear that I actually think I understand it for at least 5 minutes after I finish it. But that's 5 minutes more than before. 😁
I can see this system actually reading the text on signs and interpreting their intent and authourity. In other countries it could translate this text to align with the existing AI training. ARRET = STOP etc. LIDAR cannot compete with this.
I think, for inference efficiency, separately trained context blocks of models could be switched and potentially blended at a high level, rather than having French road signs in the model used to drive in US, or snowy conditions to be considered when driving in a hot desert. This would be contrary to the end-to-end approach, so it doesn't look like it's the way they're going.
@@MattOGormanSmith So if I was driving from say Ontario to Quebec, the car would download the French context module at the border or have to have all languages loaded all the time just in case I decided to make a run for the border. This would need to be considered for left hand drive juridictions that border right hand drive ones. The world is such a complex problem.
Lidar is just a sensor it is not comparable to anything with regards to AI or coding.
Great video! One of your best so far IMO. I keep coming back to your channel for these great explanations.
Hope all is well as can be with your family. You being able to put things out while supporting your loved ones (in your special way) is inspiring.
Thank you for breaking it down! You have to give it to George Hotz. He said from the beginning, that end to end neural networks will be the solution and Tesla will come around eventually. Also when he saw the occupancy network image, his immediate reaction was: "They created a LIDAR image in software".
IMO the best and most insightful video you have created in a long time 👍🙂
Thank you, thank you. Excellent video. Foundation Model! That's the part I was trying to put a word to. I'm a 30year software engineer who came through the maze of 8080 (Z80) assembly, pascal, C, C++ object design. My major was Software Architecture. I knew I was missing the door-way into AI because it is NOT based on deterministic algorithms. I have made a few comments and didn't mean to be snarky, but I knew their was something deeper that I wasn't understanding. Love your videos along with Warren's.
Hey Dr know it all... just wanted to tell you your awesome... I gave a bit of healthy criticism on your last video and just wanted to add that thats all it was... your the bomb and we all can still improve right? Keep up the great work bud.
This was a great summary. Thank you.
I'm pretty sure an ontology will still be required to generate the FSD onboard display and as an audit trail for accident and situational analysis. If my guess is correct, hydra heads will be part of the reinforcement learning to keep key ontological elements in the foundation model so they can be extracted as needed.
As you've said in your videos on this, a foundation world model is a significant breakthrough. I'm interested in your thoughts about Tesla doing with a diffusion-based approach instead of generative transformers.
Your best video I’ve seen, by far. Thanks for this. More of these tech deepdives please 🙏 ! You also explain it really well.
One of your best ever ... now I get it ... AMAZING stuff... and can't wait for it to show up in my car!
In the xai QA Twitter Discussion Elon said: „If i look at the experience with Tesla what we’re discovering over time is that we actually over complicated the problem i can’t speak too much detail about what Tesla‘s figured out but except to say that in broad terms the answer was much simpler than we thought we were too dumb to realize how simple the answer was“ Do you think what he talks about is related to this?
Classic Musk nonsense. He makes it up as he goes along.
RLHF: Reinforcement Learning from Human Feedback
NeRF: Neural radiance fields (NeRFs) are a technique that generates 3D representations of an object or scene from 2D images by using advanced machine learning
GPT-4 serves as a foundational model for understanding the world linguistically and is used to create systems like ChatGPT. In a similar vein, Tesla's world model, which learns to understand the world through raw images or photons, will form the foundation for their Full Self-Driving system. Correct me if i am wrong !
How long before the car would recognize it is driving towards itself, when confronted with a large mirror?
Doesn't really matter, it would have to break anyway, as a human would. You don't know what's behind the mirror.
Your most brilliant video yet, superb explanation . Bravo 👏👏👏
Thank you for getting the right understanding and separating out the different aspects of what the CPVR videos taught us. It seems others don't get it unless it's spelled out like here.
Final step is just to be able to pronounce Ashok's name correctly. At one point you mentioned Ashok and smoke in the same sentence and yet pronounced Ashok the way you always have!
Great video ! Great explanation ! Keep doing your work … at some point people will never drive better safer than a Tesla car ! The question is not if it happen … the question is only when will it happen!
Great explanation. I see that getting corner and extreme cases are still a problem, but they are for people too.
LOL. Stuff must be sold to sold buyers.
Hardly understood a word, but REALLY appreciate what you are doing here Sir.
Really well covered, given the complexity of the subject. Thanks!
It also represents the ability for virtual time travel... into the past (to the point of the start of the 4d model), and a short distance into the most plausable furtures...
it's all cool, but as I understand it, it doesn't run on the car's computer. This is what will be run on Dojo to train neural networks. I think it will take a lot of additional data collection and enormous computing power.
It seems like a major difference between HW 3 and HW 4 is the radar module. Do we have any idea when and if radar is coming back into the picture? Could radar better calculate distances in inches and/or see in low light/visibility situations. Putting radar back without a plan to us it doesn’t make sense, but having two major standards is more a a difficult problem. It is possible to design a new computer (HW 4 lite) that has the computing power of HW 4 and can be upgraded, use the older cameras but the cars with no radar or LiDAR are forever limited no matter the computer or software.
Soo... I'm in the industry - but in different forms of AI... predictive maintenance... Security models... are some.
Super great article for the layperson ... Thanks DKIA!
I think I understand what you’re saying about the world model and predictive training. But when I drive I also have a model of the actual environment I drive in - a 3D map in my head. I can draw it out - lane directions, # of lanes, curbs, buildings, changes in elevation, pot holes etc. on roads I actually drive on which helps me predict (in advance of even being at a particular location) what I need to do on the road I drive on every day - is this also happening or is this all part of their world model?
John, do you still think they will have heuristic code (software 1.0) for things like safety and hard boundaries (don’t over accelerate or turn wheel at a certain rate) and also do you think they will still use the voxel approach to define drive-able space. Or all this just gets replaced with the new end to end world model?
Thanks, this is great to understand! Was definitely worth clarifying.
It would be great to understand the kick starting of these models, and where it is similar or different to biological brains. Random training from scratch you clearly want to shortcut.
I'd be interested to learn if the solution has been to merge the previous NN's developed together as the starting point for a single world model evolution, and train from there? If that is so, I suspect some future features be helped along by developing new specific classifiers and adding that to the model for a kick along. For example new regions with some totally different structures, vehicles and signage and road rules. I imaging that is closer to how we learn, recognizing the new things as a special case, until they are integrated into the lower level 'don't think about it'.
Amazing explanations, thank you so much!! This whole debate reminds me of the days when I was studying philosophy, especially Wittgenstein. He first thought that explaining the world can only be done by describing "what is" (more like what is coming in via the photons), and who later completely changed his mind and said something like language creates the world (more like what ChatGPT is doing). And it is fascinating to understand that now when trying to emulate the human brain we are essentially using this same process of deep learning for both dimensions (space and linguistics). Do you think with driving, that part of the human feedback will be to edit out things where other humans made mistakes? otherwise the system may not be safe and may even repeat accidents...
So basically the zero system from Gundam Wing where it predicts the future and feeds it to the pilot.
Ya, we saw your video of same thing last year where you say the same thing.... You sure have the spirit! Too bad it was reported Tesla's resale values plummeted along with virtually ALL ev's recently. Keep pumping! (Same time, next year?)
Very well explanation and this make me having a goosebumps when its clicked in my head.
An End-to-End Model, in the context of machine learning and AGI, refers to a model that takes raw input data and feeds it through a series of transformations, ultimately outputting a direct prediction or control action. This approach contrasts with traditional methods that involve multiple stages of processing and human intervention. For instance, in the case of Tesla's self-driving technology, an end-to-end model would take raw sensor data (like images from cameras) and output control actions for the vehicle (like steering, acceleration, and braking commands) without any intermediate steps like object detection or path planning.
The advantage of an end-to-end model is its ability to learn complex tasks without requiring explicit programming for each step. However, training such models can be challenging due to the complexity and the amount of data required.
A World Model, on the other hand, is a representation of the environment that an AGI system uses to make predictions about future states based on its current state and actions. It's a kind of internal simulation of the world that the AGI uses to plan and make decisions. The world model should be able to understand and simulate the dynamics of the world in a way that's similar to how humans understand the world.
In the case of Tesla's self-driving technology, a world model would need to understand the dynamics of driving, including the behavior of other vehicles, pedestrians, and the rules of the road. It would use this understanding to predict the outcomes of different actions and make decisions about how to control the vehicle.
Tesla is working on creating a foundation model, which is a kind of world model that trains itself on a large amount of data without needing explicit labels. This foundation model would then be fine-tuned using reinforcement learning through human feedback to optimize its performance.
An end-to-end model and a world model are key components of AGI systems. They allow the system to learn complex tasks from raw data and make decisions based on an understanding of the world, respectively.
OK, now I get it, thanks so much for persevering. In this purely visual world model, I assume they will have to devise some way to more explicitly teach about physical properties such as slipperiness of road surfaces that are eg wet, icy etc, perhaps by directly setting up training episodes so it can learn the safety envelopes?
Also, I imagine, judgments about human interactions such as being flashed when it might mean 'after you, please go first' versus, 'warning, hazard ahead' etc. And of course 'Road Closed', 'No stopping' etc.
With best wishes.
Edit: I liked your ChatGPT analogy about building a purely statistical foundation model, onto which you train human preferences and higher level subtleties.
You went to Everest Base Camp? Wow! I’m impressed (and a bit jealous)!
Great Deep dive! I know a lot of this is possible because of the transparency of Tesla. Could you also do a deep dive and maybe a comparison with other autonomous programs, (Mobileye, Waymo, etc) , even though they may not have the same level of transparency. That would be super interesting.
Thank you, thank you, THANK YOU!!! You've helped many people to now truly understand how Tesla will finally solve L5 for everywhere, and why nobody else can.
Comma can
Thank you for making this fantastic informative video.
This is awesome didactics. Thank you for making this incredible step change easy to understand.
Thank you, I learned heaps and filled in some knowledge gaps from the way you explained this.
John Vervaeke is a neurophychologist who's developed the 4P model of cognition and it explains and summarizes very well the difference between "propositional" knowledge, i.e. seeing a baseball game on TV vs "participatory" knowledge, i.e. playing baseball.
Awesome videos. Thank you.
Where do You work?? Viewer ,instantly watching released videos. Then commenting a Paragraph response.❤😮
You made it very easy to understand, thank you
Great video, appreciate your work!
Super video. Great explanation. Love it
Since the car's sensors/cameras cannot 'see' any of the car itself (though it must 'understand' its own physical boundaries), wouldn't Optimus need a significantly upgraded version of the World Model in which it can see/perceive its own hands, limbs, physical 'body' relative to the World Model attributes? An end-to-end training regime would be vastly more efficient if ego had self-awareness... seeing where its hand is relative to an object it is picking up, for instance. This interaction would border on consciousness.
FSD knows it's body in the world, and it's wheels. Park assist shows this. Optimus just has a more anamorphic footprint in the 3d space.
I just want to know how through all of this they address the biggest flaw that is affecting my drives today, which is old/inaccurate map data. Right now map data from an outdated system acts first and will constantly put the car in say the right lane when you are about to turn left because the road was different 8 years ago and the maps are from that time. Is their solution to upgrade to a better mapping system or somehow have the vision system lead and the mapping is somehow second. Other than not recognizing signs this is the only thing that causes me to disengage frequently where I live.
So the big question, if you collect all data then release an update all cars drive the same.
If each car learns like a human will we have high mileage cars being much better at driving than one that hardly gets used?
Or will they download the best driving car to everyone as a starting point?
Loving the videos thank you.
very informative. thank you for this video
15:00 Thank you! Sharing your Brilliant Educational Course ❤
You ROCK DOC! Love your videos.
Excellent explanation now I have a better understanding thank you!
If you’re citing LeCun as an authority, just know he doesn’t think the field will have any agents with a world model for at least a decade.
And even then his thoughts are that they may only be as capable as a mouse.
Excellent presentation
Awesome explanation thank you!!
Thanks a lot man! Great vid
Does show why cars that are coming towards you from a backwards angle are not identified. That angle doesn't appear to be in the shot. I wonder if they do similar projections from other than the front looking cameras?
Great explanation.
Foundation models are basically unsupervised learning making a comeback with superpowers.
world model essentially is that the system is defined .... uniform exits, lane markings, signage. woe be the case where the cones blew away or were removed.
28:09 the driver feedback could be hitting the brakes, re-taking control of the car, facial expressions recorded on the interior camera, body language, etc.... the options are endless
Unfortunately FSD is also learning some bad habits of average drivers - for instance camping out in the fast lane. In most states the left lane is for passing only.
/freaking great! Very informative. Thanks
Future Headline: 'Hackers change FSD USA to right hand drive & create chaos!' Hopefully this is just me being paranoid.
“There’s a difference between knowing the path and walk in the path.”
18:00 just wish Text was LARGER , so i could read it... ill just keep Listening...❤
Doc, RE: ChatGPT: Are researchers creating foundation models in every language?
Making sense ... Once the world model is formed, you said that it organically becomes increasingly complete or perhaps higher resolution or in some way simply better and better based on its experience with the world on going. The new of a few days ago is that Tesla is proposing to hire people to driver their cars later this summer and early fall for from I think it was $18 to $45 per hour. Does this hiring integrate with the world model refining you describe?
I suspect it might make some uncomfortable that it seems to be creating a sort of "black box" to process in this way, making the learning and interpretation process opaque to humans. What I realized is that this doesn't eliminate labeling for the purpose of the human interface, it simply separates it from the real-time prediction process.
very few of these comments sound like they actually understood the video and the potential of Tesla growing a Foundational World Model... for Tesla investors, its means we get all da monies
Now if only I could actually use the FSD I paid for.
I think you are correct most of the time. However, concerning the problem with FSD going faster than the speed limit. FSD will not do that. The slow car is not causing dangerous conditions, it is the vehicles going too fast that is the problem. I am not sure how FSD is going to handle road rage.
I was talking to some safety drivers who got the opportunity work with a self driving system which had been end to end trained. It was capable of delivering spectacularly. Good behavior. Better behavior than they had ever seen in SD system made of modules like perception and planning. Unfortunately, the end to end trained system was capable of absolutely terrifying behavior. And there was no ability to determine what would cause either behavior.
Although the DMV does make you take a road test, you can only do that once you’ve demonstrated a high level of capacity to recognize signage and other abstract concepts of operating a vehicle on the public roads. I’m still wondering how Tesla can make strong assurances that their FSD system won’t keep smashing into first responders vehicles, or first responders themselves
Wonder if they could map the whole united states and make a 3d clone once there are more cars on the road and then use that to also help train FSD?
They will need a lot of training data for sure. FSD should be able to handle even unknown roads. What they need is a lot of driving situations rather than an exact map. Human drivers can drive through a city without a map, just reacting to the current traffic situation. That's the goal.
I always wondered in the case of fog, precipitation, and smoke, if the headlights and cameras could be working in the infrared range to be able to see stuff humans cannot.
I don't believe the infrared has been mentioned. The computer processing has been described as photon counting. The mapping in the dark is quite impressive, much better than we can see. I think the Tesla team will have considered infrared and decided it was not justified. Obviously we do not know all the details of every decision they make.
Even seeing a little better than humans , with 8 cameras, under these conditions will help - the chances are, these abilities will improve over time...using "photons" rather than "filtered output" ( which Tesla have implemented) will provide immediate improvements...
Well, Musk claims so many things that never come close to happening...so why not claim infra red headlights. Then the TH-cam fanboys can make twenty-minute videos explaining how great this new tech that Tesla has (be sure to use present tense) is and how all the other car companies are therefore toast.
@@chriswright9096 As my French teacher once said " That, in Anglo-Saxon terminology, is a load of bolleaux .."
23:00 fine motor skills 😉
AMAZING!!!! Blade Runner is here. An exciting time to be alive.
@29:29 I only have two words to say: _Mirror Neurons._
People who can _dance,_ not just __spaz out to a beat,_ usually learn the kinematics of a dance from just watching it, a dozen times before they try it.
*Dojo* can be considered like *_Ego's_*_ mirror neurons._
Thanks for dumbing it down for the rest of us
What Ashok did not talk about in his presentation is the division of work between an individual Tesla vehicle with Hardware-4 and DOJO, so we presently do not know if Hardware-3 is already completely obsolete or still has some future utility beyond training the new system. Apparently HW3 is already maxed out regarding this new system. Elon has already responded to a Twitter question that HW3 we will not be upgradable to HW4. We will see.
We have not been told that hw3 is maxed out, nor that it will not be capable of self driving. There will be a hw5, hw6 and so on because they will always be improving it. That doesn’t mean any previous hardware will not succeed, just that the next one will be that much better
If Tesla thought that HW3 wasn't capable of FSD, they would have made it up-gradable.
If they sometime in the future come the the conclusion that HW3 is insufficient, they will make an upgrade kit.
Excellent!
Like your shirt. Have you been to EBC?
hmm, I slow down and use the high level executive function module of my neocortex to address edge cases in the real world appropriately and not rely on some heavy duty black box world model inference process. Scares me to think others don't and that FSD wouldn't aim for a more modular high level abstraction layer over real world physical scenarios....
2:56 Most LIDAR scans do look more consistent, to be honest.
Because they contain a lot of redundant information.
Thanks!
Voice and text intervention for FSD
I just feel like there's going to be a loooooong tail of edge cases as long as FSD is only trained on the road. It will always end up encountering situations it fails to handle that any human could negotiate without issue. Humans understand the world through a lifetime of experience with it and the things in it. A computer backprop trained on driving alone will never understand the world and the things in it in a way that will allow it to behave like a human reliably would. That being said, FSD will of course become (I believe it has been already) statistically safer than the 'average driver' (which doesn't exist) but when it makes a glaring mistake nobody can imagine making which results in someone's death, the consensus will be that it's not safer. People want FSD to at least be as-good, or better, than themselves, and witnessing it make mistakes they can't fathom making erodes confidence in it. As long as Tesla has to tell drivers to be alert and ready to take control, it's not really "Full Self-Driving". When they can say "Lie back, take a nap, watch a movie, enjoy the ride. We'll get you where you're going quickly and safely!" because it doesn't make anymore obvious mistakes, is when it's FSD. I appreciate their pursuit, but I just don't see something that doesn't understand the world like a human reliably behaving like a human.
Exactly. For all this talk about perception, I have not seen so far a plan from Tesla in how to resolve edge cases. I mean if even now they are still struggling with perception, when are they going to tackle the almost as complex space of actions?
World models are key for living creatures with brains (insects to people). They appear both genetically hard-wired and plastically configured/learned (nature/nurture). As Piaget, Montessori, and other cognitive/developmental psychologists have learned, the "order" of things/principles learned is important. What and how an infant learns, as the brain grows and develops prior to the neural "trimming" (a "distillation"?) that occurs around 5-6 years old, establishes a sort of cognitive framework for a lifetime (very hard to change later). This is one reason suggested why "talents" and "IQ" seem to stabilize. It will be interesting to observe if, when, and how AI neural nets develop capabilities during similar predictive training.
For living things, while physics is rather universal, world models are shaped by experience over time in contextual environments (air, water, ground, maybe space) and by embodiment (senses, legs, wings, "cultures", etc.). Environments change (day/night, seasons, weather, etc.) but so do bodies (development, maturity, infirmity, aging, etc.) Let's hope our automobiles don't decide they are just members of a herd or school of fish and decide to follow the lemmings up ahead.
19:00 they need a Terminal Output OS...😅❤DARK MODE😅❤
I know to play piano. (I press the key on the piano that corresponds to the note in the sheet music.) But l can't play the piano.
Same idea you were trying to express with the rock climbing analogy.
Thanks a lot
scratching my speculative itch..
Thank you
RETIRED 80, USAF,VFW, FREEDOM. WHY DOESN'T WORLD MODEL GETS SO BIG, DATA WON'T FIT ON VERSION 3 COMPUTER?
Thx, I think my brain just exploded but understand it a little more. 😅
Someone telling you how to shift a car with a manual transmission and clutch.. is different then actually doing it when you're driving.