Tesla FSD V12 Has a BIG Problem! (Lex Friedman Pod!)

Dr. Know-it-all Knows it all

มุมมอง 50 347

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 26 เม.ย. 2024
While Tesla's FSD v12.3 is an amazing step change in the quality of autonomous driving--and feels SO close to being perfect--there is still a HUGE issue with it that has plagued every version of FSD: the "mid term planning" of about 30 seconds. It still lacks the sense of local knowledge and hierarchical planning that an experience local driver has. Well it turns out Meta's chief scientist, Yann Lecun, has an answer in his new work: V-JEPA. Visual Joint Embedding Prediction Architecture might just be the solution Tesla is looking for, and Yann describes it on a recent Lex Fridman video. I break it all down in this deep dive video!
**If you are looking to purchase a new Tesla Car, Solar roof, Solar tiles or PowerWall, just click this link to get up to $500 off! www.tesla.com/referral/john11286. Thank you!
Join this channel to get access to perks:
/ @drknowitallknows
**To become part of our Patreon team, help support the channel, and get awesome perks, check out our Patreon site here: / drknowitallknows . Thanks for your support!
Get The Elon Musk Mission (I've got two chapters in it) here:
Paperback: amzn.to/3TQXV9g
Kindle: amzn.to/3U7f7Hr!
**Want some awesome Dr. Know-it-all merch, including the AI STUDENT DRIVER Bumper Sticker? Check out our awesome Merch store: drknowitall.itemorder.com/sale
For a limited time, use the code "Knows2021" to get 20% off your entire order!
**Check out Artimatic: www.artimatic.io
**You can help support this channel with one click! We have an Amazon Affiliate link in several countries. If you click the link for your country, anything you buy from Amazon in the next several hours gives us a small commission, and costs you nothing. Thank you!
* USA: amzn.to/39n5mPH
* Germany: amzn.to/2XbdxJi
* United Kingdom: amzn.to/3hGlzTR
* France: amzn.to/2KRAwXh
* Spain: amzn.to/3hJYYFV
**What do we use to shoot our videos?
-Sony alpha a7 III: amzn.to/3czV2XJ
--and lens: amzn.to/3aujOqE
-Feelworld portable field monitor: amzn.to/38yf2ah
-Neewer compact desk tripod: amzn.to/3l8yrUk
-Glidegear teleprompter: amzn.to/3rJeFkP
-Neewer dimmable LED lights: amzn.to/3qAg3oF
-Rode Wireless Go II Lavalier microphones: amzn.to/3eC9jUZ
-Rode NT USB+ Studio Microphone: amzn.to/3U65Q3w
-Focusrite Scarlette 2i2 audio interface: amzn.to/3l8vqDu
-Studio soundproofing tiles: amzn.to/3rFUtQU
-Sony MDR-7506 Professional Headphones: amzn.to/2OoDdBd
-Apple M1 Max Studio: amzn.to/3GfxPYY
-Apple M1 MacBook Pro: amzn.to/3wPYV1D
-Docking Station for MacBook: amzn.to/3yIhc1S
-Philips Brilliance 4K Docking Monitor: amzn.to/3xwSKAb
-Sabrent 8TB SSD drive: amzn.to/3rhSxQM
-DJI Mavic Mini Drone: amzn.to/2OnHCEw
-GoPro Hero 9 Black action camera: amzn.to/3vgVMrH
-GoPro Max 360 camera: amzn.to/3nORGYk
-Tesla phone mount: amzn.to/3U92fl9
-Suction car mount for camera: amzn.to/3tcUfRK
-Extender Rod for car mount camera: amzn.to/3wHQXsw
**Here are a few products we've found really fun and/or useful:
-NeoCharge Dryer/EV charger splitter: amzn.to/39UcKWx
-Lift pucks for your Tesla: amzn.to/3vJF3iB
-Emergency tire fill and repair kit: amzn.to/3vMkL8d
-CO2 Monitor: amzn.to/3PsQRh2
-Camping mattress for your Tesla model S/3/X/Y: amzn.to/3m7ffef
**Music by Zenlee. Check out his amazing music on instagram -@zenlee_music
or TH-cam - / @zenlee_music
Tesla Stock: TSLA
**EVANNEX
Check out the Evannex web site: evannex.com/
If you use my discount code, KnowsEVs, you get $10 off any order over $100!
**For business inquiries, please email me here: DrKnowItAllKnows@gmail.com
Twitter: / drknowitall16
Also on Twitter: @Tesla_UnPR: / tesla_un
Instagram: @drknowitallknows
**Want some outdoorsy videos? Check out Whole Nuts and Donuts: / @wholenutsanddonuts5741
V-JEPA Post: ai.meta.com/blog/v-jepa-yann-...
Lex Fridman/Yann LeCun Interview: • Yann Lecun: Meta AI, O...
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 294

@windsurfertx1 หลายเดือนก่อน ⁺⁴¹
I already have it. My wife says “you need to get into that other lane” 20 seconds before I think about it.
@ove1knobody495 หลายเดือนก่อน ⁺⁵
😂😂
@Matzes หลายเดือนก่อน ⁺¹⁰
That solution is too expensive
@robinheider414 หลายเดือนก่อน ⁺²
😂
@capslock9031 12 วันที่ผ่านมา
Pls send over wife.
@JMeyer-qj1pv หลายเดือนก่อน ⁺⁷⁹
I think another missing element in FSD is the ability to have a local memory. I see people complaining that FSD "always" makes the same mistake at a certain place and they hope the next release will fix it. When we commute to our jobs everyday, we learn new, little things about optimizing our route, but for FSD every trip is like the first time it is driving it. We saw in the Figure bot demo that the Open AI model has a short term cache of local memories that it uses as an additional input when responding to commands, and it seems like FSD should have something similar.
@15Stratos หลายเดือนก่อน ⁺¹
Well that's how neural networks work as well but it takes some time to feed the data back to the net to improve the training.
@lym3204 หลายเดือนก่อน ⁺¹
@@15StratosIf you are not classified as a skilled driver your inputs may not be considered for updating FSD.
@15Stratos หลายเดือนก่อน
@@lym3204 may not be considered?well yeah but I'm sure that the Tesla ai team must be taking a look at most of the data or at least a large amount of it
@ryansmithc หลายเดือนก่อน
Tesla was compute limited + IMO you could achieve the same thing with providing video of all the successful trips from one point to the other, and then make a dynamic live map of the area. So it’ll train to know where to stop, lane and behaviour. I think that’s why they are sending tester cars in some areas
@lym3204 หลายเดือนก่อน
@@15Stratos thx
@daveoatway6126 หลายเดือนก่อน ⁺⁵
An impressive guest with Lex. Thank you for connecting the dots (no pun!) for the complete action sequence of decisions - much more complex than even a complicated management problem!
@nikkitson3878 หลายเดือนก่อน ⁺¹⁶
Applying JEPA actually makes sense. I drove with 4 level hierarchy, updating on different timeframes & Tesla misses some.
I think of it in terms of Objective, Strategy, Safety & Operate.
Objective = Destination, refresh rate is minimal, mostly just passengers changing their mind, but level of charge comes into this one.
Strategy is thinking 10-20 seconds out, where the driver has situational awareness of the whole visible road. Manually updating traffic ahead, objects, how crowded is off ramp etc. Tesla messes this up all the time, it can’t seem to create a strategic model & definitely doesn’t look down the road. I’m always pre-programming the 3 potential safety actions (Brake/Accelerate/Turn) & looking for potential bad actors or unexpected items. Tesla doesn’t see/care about this stuff & relies on its reaction speed, which is currently slower than a good human. Maybe compute increase could fix, but maybe just lazy engineering too!
Safety - Tesla does this pretty well in terms of reaction time, but it’s dumb as two planks for its actual reaction, as it has no strategic pre-processing, just reacts on the fly. My human brain is constantly updating which thing to do & can mostly predict outcomes by seeing the details (experience: 0 accidents in 400k miles on gravel, snow, highway, street, off road, beach, both sides of road). For example, when racing down a faster lane with adjacent slow traffic, human brain is looking for bad actors pulling out, Tesla brain doesn’t & actually takes longer to detect. Human brain sees wheel turn, Tesla code only sees object move. Same logic with potholes, objects on road, falling debris etc. Tesla needs that strategic processing job to feed into the Safety & Objective functions
Operate - FSD 12.3 really masters this part. It drives almost as well as human, give it done race car data to make it corner properly without FREAKING OUT every other driver when it sticks to center of lane & we have a winner!
vJEPA seems to be an awesome approach. The strategic part doesn’t need a lot of different training, there are only so many generalized issues (even falling trees!) to figure out.
Great video DrKIA 😀
@alexandreblais8756 หลายเดือนก่อน ⁺⁵
I watched the whole interview yesterday and It was indeed Very interesting. Thank you for your video, it definitely helped clarify a few things. Yann LeCun is a brilliant Man.
@2009RayMD หลายเดือนก่อน ⁺²
I was hoping you would do this! I wasn’t able to understand the missing piece in the abstract representation well from Yann LeCun’s interview but you made it crystal clear now what that hierarchical planning is needed for. I do wonder whether when the car goes around the block again after not having found parking closer to the destination and seems to remember some spots further away, it is tapping into such a model.
@glenbuckner8244 หลายเดือนก่อน ⁺¹
Thanks for bringing this to our attention. Potential to solve the navigation issues is exciting.
@cleanitup_pls7893 หลายเดือนก่อน
Very worthwhile post. Thanks for this. What I know about the architecture for planning is that it will form getting out of a chair independent of taking a trip. The task for getting out of a chair has its own plan and that plan can be invoked by the trip plan and multiple other plans. Same with getting an elevator, getting ground transport to the airport, etc.are all independently formed but can be included as a component of another plan. Each independent plan forms the basis of what we need to know and do at execution time, and at execution time the real-time parameters for decision actions are brought to bear along with the plan.
@jimkain6958 หลายเดือนก่อน ⁺²
Well done, Bravo!
@RussAbbott1 หลายเดือนก่อน
In your discussion at 27:00 - 30:00 can you explain what specifically is being compared between the predictor and the real thing? Presumably, the comparison takes place at an abstract level -- at least abstract to some extent. What does that really mean? I'm having trouble understanding how the comparison is operating. Can you provide more information about that? Thanks.
@andrewpaulhart หลายเดือนก่อน ⁺¹
Now that someone has come up with this new concept it seems so obvious that it is needed. Love this sort of thing.
@human-person หลายเดือนก่อน
When did you get pushed/install 12.3 in relation to when you were in Whistler? Do you use TeslaFi? There’s been one install listed in Canada for a little while now, in BC, and it’s been driving me crazy
@jonbowes5999 หลายเดือนก่อน ⁺³
John, two points have I to make 1. video has 3D information provided the contents are moving (each frame is akin to a different camera angle on the objects within the frames that are moving) this is as good as stereoscopic vision! 2. Tesla is already using hierarchical neural networks for perception and planning currently the limitation of data volume constrains the temporal planning horizon somewhat, however I believe this horizon will expand as training capacity expands...
@CarloHerrmann หลายเดือนก่อน
Thanks John. I have always been taught that driving is all about anticipating.. I guess this is what you call predicting. Anticipating is then all about planning to prepare for what you anticipate …. certain past experiences (memory) play into all these parts. As I get older I notice that anticipating is not as efficient as when I was 20 years old.
@pataulson 13 วันที่ผ่านมา
John ... this is a great example of your significant ability to parse out what is going on in the AI/FSD/Optimus space. I'ts hard to say how much I appreciate your contributions in this regard. As a neophite trying to get my arms around how neural net - AI is and will change the world your straight forward explanations are immensly helpful. Thank you - thank you for all of the thought and time you put into this. You are in the truest sense a most generous teacher! (comment obviously NOT AI generated 🙂)
@TimLauridsen หลายเดือนก่อน
Great video, very informative to digest this complex information
@tech_tesla_and_trends 10 วันที่ผ่านมา
Thanks for geeking out John that was worth listening to! I certainly hope you're right that the Tesla team is already on board looking at these things.
@ghauptli1 หลายเดือนก่อน ⁺²
I love the way 12.3 handles my neighborhood; it slows down when kids are around, and it naturally moves over a bit for oncoming cars. However, it is now avoiding slow, safe left turns in favor of rights for no reason. I'm curious to see if the next update solves this new problem.
@wlmsears หลายเดือนก่อน ⁺¹
I'd love to see examples of instances where you would drive differently from FSD 12.3 in real driving situations. How common do you think this is? What is the actual planning horizon in seconds? In other words, how far ahead do you need to predict to do a better job for specific instances? If this is common, I would think you would numerous examples in you FSD videos.
By the way, great job of pulling the essence from Yann LeCun's interview with Lex Friedman, and the description of JEPA. One of the exciting things we are seeing is the use of semi-supervised learning for building foundational models in this and LLM's. I think good foundational models appear to be a key aspect of human learning.
@captcurthess หลายเดือนก่อน ⁺¹
Bought FSD 2+ years ago. Got 12.3 yesterday. At very first stop sign, the car refused to enter the intersection…the visual range was difficult, but easy to cross the 25mph street. After 30-seconds stopped, I just proceeded. Don’t try to convince me that this is “almost human.” I’ll keep waiting, no choice but to do that.
@terrycmartin หลายเดือนก่อน
There are a couple things I wonder. Do they train individual isolated behaviors from small little clips only or do they also train on whole drives or at least larger segments of drives. If they only train (so far) on small isolated clips, that may explain why they've not addressed this issue yet (I'll explain more). If they already train on whole drives or at least large enough segments of drives and still have this problem, it may be because they're not feeding the training enough input "features". If I have 2 upcoming left turns from road segment A to B to C and B is really short because it's not even a real road but just a little connecting bridge or something (I have this situation near my home) and I'm only on B for 2-3 seconds and from B, traffic can continue straight or turn left but I'm needing to turn left from B to C, then ideally my car needs to already be in the left turn lane of B which means ideally I'm in the left-most turn lane of A which has two left turn lanes (1 & 2). Left turn lane 2 on street/segment A is for those wishing to turn onto B and then continue straight instead of turning left again. Now, if training longer drives that could cover this whole situation (two left turn in quick succession), the training data needs to include the relative destination if not the absolute (final) destination. Does the training currently include that? We don't know. Armed with the relative destination at least, then while on A and coming up on B, I'd want to pass a "feature" input into the training to let it know my near-term local destination is 2 segment away and perhaps time-wise, it's only 10 seconds away (or whatever). Let's say they train in frames of time or even unitless and the "frame" distance from the car's current location on A to B is 10 frames away. They need a way to calculate how many upcoming road segments are within the next X frames. That should be easily calculable via map route data (IF they're including that data - again, we do not know if they currently are). Once they include the number of upcoming segments within that frame window of X (maybe the next 30 frames or whatever) and some relative positioning info along with those segments to imply left turns onto B and again onto C, that should be enough input "features"/data for the model to eventually infer from video training "Ahhh I see what humans are doing now. Whenever they've got an upcoming series of left turns in quick succession within the next X frames or next 15 seconds or 30 seconds or whatever, IF/when there are multiple turn lanes, they get into the left-most one, etc.". No one needs to explicitly code that logic. It'll infer it from the inputs. The problem is - are they including enough of that input data for it to make that connection? If their training clips never include enough "frames" to cover from A to B to C in my example, then obviously they'll not be able to teach the car to handle such situations adequately - presumably.
@wademt หลายเดือนก่อน
You know just enough to get in trouble lol. Transformations are necessary to turn the data you have into usable information for the model you’re building. The model will do whatever it’s designed to do, it’s all it can do, it’s what it was made for. Yes, this guys is describing a model designed to do something slightly different, and he’s probably spot on. But transformations a critical to model design. I design models to predict the future. My future is how much energy will the NW US power grid need in the next 5, 15, 60 minutes. All of the issues being worried about will work out with proper model design, good data, and collaborative effort. I appreciate your enthusiasm.
@ekaa.3189 หลายเดือนก่อน ⁺¹⁸
I was playing with FSD v12.3 last night. It doesn't like rural roads at night. Speed constantly slows, and it often ignores or waits a long time before responding to speed limit changes when entering and leaving towns. Note: In cities with regular streets it did much better, and followed speed limits and speed changes.
@sapienspace8814 หลายเดือนก่อน ⁺⁵
I can understand, FSD v12.3 might be like a teenage driver, still working on getting it's learning permit, especially lacking experiences in rural roads at night. The fact that it slows down, as opposed to speeding up, indicates prioritization of safety, though, does increase hazard of higher relative velocity from human drivers behind wanting to go fast.
@DC-yh2oy หลายเดือนก่อน ⁺⁵
This is a training issue. Tesla is focused on training for the hard and dangerous parts of driving, like fairly congested cities. As v12.3 rolls out and more diverse training data is available they will train for other situations like rural night driving.
@johnjackson9564 หลายเดือนก่อน ⁺¹
Neither do I if you can not see the animals coming at full speed from the dark edges. Especially big animals
@lkrnpk หลายเดือนก่อน ⁺¹²
I do not quite get what is the problem there? Is it the issue that FSD does not understand if for example there is a traffic jam or an accident in front and cannot re-route? But that can be solved just hooking it to Waze or some similar app that can re-route if you report that there's an accident or big pile of cars in a traffic jam not moving, or is it about some lower level maneuvers? I often use Waze in an unfamiliar city and I also do not know when to take exits etc. and it says it to me and even if I miss one, it instantly re-routes me. Or the issue is as I said on some lower level decisions? To me it seems this V-JEPA is not as much about FSD as to solving general intelligence, AGI, in a less time consuming and cost efficient way, but I do not believe for solving driving you need AGI.
@aggerleejones200 หลายเดือนก่อน ⁺³
I agree. Seems like an issue for the bot but a car is only interacting with roads.
@jmattoxriskpro หลายเดือนก่อน ⁺⁴
I agree. The frame of the Lex discussion is an obstacle to LLMs achieving AGI. This is really not relevant to FSD.
@firewoodlake หลายเดือนก่อน ⁺¹
I think the problem is that it gets better each month and human drivers will not be able compete.
@rickgoodman8911 หลายเดือนก่อน
Agreed. No clue as to what the problem is.
@rortlieb หลายเดือนก่อน
The good doctor said that FSD has a problem reacting on the fly to changed circumstances. He has 12.3, so I assume he knows what he’s talking about. I would like to have a more concrete example, however. To date, his main criticism of12.3 is “wimpiness.”
@mitchellzastey หลายเดือนก่อน
Make more efficient and relevant decisions by generalizing the fine details.
Sounds great!
@falconxlc หลายเดือนก่อน ⁺³
Dont quite understand the headline of the video and how thats a big problem for fsd? Seems like jepa is doable for fsd and shd be added to improve routing and future predictions. With compute no longer the limitation they can train with simulation vid data to improve positive outcomes.
@PhilippeLarcher หลายเดือนก่อน ⁺¹
need clicks
@DonBrowningRacing หลายเดือนก่อน
This is exactly what we do when we leave our home and put in Orlando Tesla service. Each step is planned and actually Three alternatives are shown depending on my preferences like tolls or freeways etc. Even current congestion of traffic is considered and shown. Yes, it can get more complicated but in practice we are mostly already doing hierarchical planning.
We’re on this professor. I can explain further and more basically if you need, then you will be closer to knowing it all pedantically. 😊
@darylfortney8081 หลายเดือนก่อน ⁺²
Task planning vs path planning
@roberts932 หลายเดือนก่อน ⁺¹
when the Internet appeared, I didn‘t ask how to order a book at the store around the corner. I found Amazon and ordered the book over there.
@jacksouthern7929 หลายเดือนก่อน ⁺¹
10+ year old navigation systems have seemingly done this for years. My first impression of your hypothesis!
@MooseOnEarth หลายเดือนก่อน
15:39 - the advantage of the phase "advanced machine intelligence" is that it includes the word "machine". It is therefore not as broad and unspecific as "artificial general intelligence", but it has an explicit application: machines. This specific application makes it clear that it is trained upon and then applied to machine applications.
@bonbooty6611 หลายเดือนก่อน ⁺¹
I disengaged 11.4 on a rental car because I was about a mile away from the airport exit in traffic and FSD was not moving over to the right. It needed to get over two lanes and I took over because I don’t like to “force” this merge at the last moment.
@RonLWilson หลายเดือนก่อน
Yes, that is the way to go!
@RonLWilson หลายเดือนก่อน
BTW, one way to do this hierarchical planning is to use a LLM to define the levels and steps and the questions that need to be answered and then invoke sub planners for each of those.
Then based on what the sub planners come up with put the entire plan together and the see if there may need to be some re-planning in that to do the former one had to make assumptions (like how hard it is to hail a taxi) and that the detailed planners might come up with different answers than those assumptions made by the higher level planners.
Thus it is an iterative process not just in that one does not know the future but also the higher level planners don't know what details that might burble up to the service from those lower level planners might.
Also there may be a need to try different approaches and the winnow those down to a final plan.
And BTW, I am not saying this just based on imaging this in my mind but having been involved in building such planning systems so I have seen how this can work... and work really well at that!
So there is no one magic algorithm or AI that can do this but requires a collaboration of many and much refinements to optimize its operations as well. So the human will not be preplaced by AI in that regard much as in how in math Godel's theorem say one cannot have a formula that can determine all truths in mathematics...as such there will always be the need for some inspiration or even guess work and
@curtisyoung7107 หลายเดือนก่อน
I found a weird interaction between FSD version 11 (multiple betas) and the navigation system that I recommend the Carlsbad Tesla site try out on FSD 12.3 to chacterize if it is still an issue that needs to be addressed.
No problem with FSD driving to the address
2253 S Santa Fe Ave, Vista, CA 92083.
What happens after I enter a new address, usually my home address and give control to FSD the series of actions and turns will endlessly drive a big loop that passes the same address
2253 S Santa Fe Ave, Vista, CA 92083
Over and over again, not following the navigation route and not recognizing something was wrong after repeatedly passing the starting point multiple times.
@gibbonsgerg หลายเดือนก่อน ⁺¹
That guy just cut me off, so I need to honk my horn at him, and flip him off. 🤣🤣🤣
@allangraham970 หลายเดือนก่อน
My guess is something more accurate than google maps is needed to do medium term planning
Maybe if more information should be included in maps. fsd could look ahead of what it currently sees to make mid term decisions.
This map more accurate, detailed data could be updated by getting info from Tesla's driving through the area and relevant info sent to Tesla to add to its more detailed maps.
Something like Waze can be used to plan for exceptions
I suspect wsymos use of detailed maps help it. If you see a corner for instance, detailed map data would allow you to know exactly how sharp the corner is before you can see how sharp it is in real time
Suspect detailed maps will greatly assist mid term planning
@JohnBrown-pw3bz หลายเดือนก่อน
Would video jepa be the answer for Chuck Cook's blind left turn.
@Delli88Burn1 หลายเดือนก่อน ⁺³
Its strange how the recent Westworld series had so many if these elements
@MilosevicOgnjan หลายเดือนก่อน
Another huge advantage of being ahead in FSD I see in the future is the analysis of more realtime data of FSD enabled cars near each other on the road. One car can see only so much, but having several cars on the same road all supplying the realtime data for the same trajectory gives you basically superpowers in terms of seeing ahead and planning.
@MooseOnEarth หลายเดือนก่อน ⁺¹
You say, this was a geeky episode? There was not a single formula in here and not a single line of code. So, this episode was what I would consider as high-level.
@oisiaa หลายเดือนก่อน
How did I miss the LeCun interview? Guess I have my roadtrip entertainment.
@JohnBrown-pw3bz หลายเดือนก่อน
Maybe a camera would not be needed except on the b pillar because the possibilities of what could be coming down the road on the left from the left could be predicted.
@MikeSieko17 หลายเดือนก่อน
2:11 wouldnt minimize time be better? since you could probably travel faster by train e.g. but might not be the shortest path 🤔
@coguglielmi 25 วันที่ผ่านมา ⁺¹
The hierarchical planning may already be there without anyone outside of Tesla knowing. Have you seen the recent video on X whereby fsd didn't follow the navigation route asking for a useless U-turn, but went straight into the parking lot on the left (requiring to cross the opposite lane) instead? Not a millisecond by millisecond kind of behaviour, to me ..
@lym3204 หลายเดือนก่อน ⁺²
This might be needed to make FSD a better defensive driver. To allow it to preempt rather than react to accidents.
@jmattoxriskpro หลายเดือนก่อน
It already does this. Anticipating when cross traffic at an intersection is unlikely to stop
@lym3204 หลายเดือนก่อน ⁺¹
@@jmattoxriskpro I think that is still reacting. It sees cars coming and it doesn't go. It sees no cars and it goes. Preempting would be if there are three lanes I don't have to be in the lane next to the car so the possibility of an accident is reduced. Or I should avoid being in the blind spot of cars so I should increase or decrease speed to stay out of people's blind spots. Current FSD could probably get you to where all the accidents are the other guy's fault so legally FSD is protected, and as far as Tesla is concerned that might be good enough for level 5.
@HARLEYROB0615 หลายเดือนก่อน
If you had to guess, how long will it take to have fsd fully optimized? I
@markmarco2880 หลายเดือนก่อน ⁺¹
10 to 20 years.
@restonthewind หลายเดือนก่อน
Thanks for explaining this concept at a high level. I feel I understand JEPA now, but you're discussing the bleeding edge of academic research not an engineered solution to humanoid robotics or robotics generally. Is this sort of modeling just around the corner or is it where LLMs were a decade ago? Do TH-cam videos illustrate the many tasks from many perspectives that a humanoid bot must perform to do useful work in factories or even a coffee shop? Even if they do, are humanoid bots at a coffee shop a more cost-effective solution than the robot barista you can already see at an Artly coffee shop in Seattle, a robot that will also improve with more AI?
@MashDaddy หลายเดือนก่อน ⁺¹
The way I think about V-JEPA is it creates "rules of Physics" about an object and only allows movement of that object in agreement with its physics rules. So it knows a dog can walk, but can not fly, or be ripped in half (like a piece of paper). Basically it's adding bounding Physics Rules to every object - once the whole set of rules for that object has been modeled - it is frozen (no changes) and used when needed with other bounding rules.
@MashDaddy หลายเดือนก่อน
This is a big leap forward in the recognition of an object, towards understanding the set of physical actions and object can take.
@mv-db4463 หลายเดือนก่อน
I'm not seeing the problem here:
The micro/ milli second decisions are made 0-60 yards (or whatever your immediate "60" is: 70, 80, 90 you get the idea) with cameras and basic training.
Anything longer and down stream can be done in the background and gets the micro second processing once it enters the 0-60 window.
Key: No impacts (0-"60" yards), anything else gets processed post 0-60 as it enters the "sets" of windows that key decisions need to be made.
Example: Until you get to the airport, Paris as your destination doesn't even enter into the decision making.
Then at the airport, it NOW comes into the medium processing window.
@davidx.1504 หลายเดือนก่อน ⁺¹
Loved this video. Hope Tesla/elon is listening
@JMeyer-qj1pv หลายเดือนก่อน ⁺³
The V-JEPA work is interesting. I think something like that could be used to update some old classic TV shows, like the original Star Trek series, to fill in the sidebars of the 5:9 aspect ratio it was filmed in to convert it to widescreen format.
@PalimpsestProd หลายเดือนก่อน ⁺²
Using FSD has convinced me that it won't be fully autonomous until 2027, when neural nets will be as complex as a human brain. It takes a full human brain to drive even of we feel we're in zombie mode. Even then there will need to be city / region specific A.I.s
@CarlosFlores-ke1lk หลายเดือนก่อน ⁺²
All of this discussion show how far FSD is from becoming a "real product" launched in the market to a broad audience - NOT BETA. NO TAILWINDS for Tesla until the LOW COST vehicle is introduced 1-1/2 to 2 years from now.
@jookyuh หลายเดือนก่อน
Driving does not require planning much further ahead than what you can see at the moment except the route you need to take. Which lane the car needs to stay on, etc. can be figured out in the context of the planned route (i.e. driving direction from the navigation system). Tesla can probably get away with including the "planned route information (normalized to be not location specific) when training FSD instead of having unnecessarily complex solution for a driving task (like AlphaStar).
@allangraham970 หลายเดือนก่อน
Originally fsd used 30 secs long videos for its training.
Not surprising it currently only plans this far ahead
@hhal9000 หลายเดือนก่อน
I think this reminds me a bit of some of the great late Marvin Minsky's ideas about AI. He used to talk about symbols for objects that saved on bandwidth by representing a much more detailed concept of something.He talks about this in his book ''The Emotion Machine'' I believe he also rather worringly made the prediction that the first truely concious AI would be insane or something like that.
@nimasahabi9421 หลายเดือนก่อน ⁺²
Do you think that HW 3.0 is enough to solve true FSD. My fear is that more cameras are required.
@jmattoxriskpro หลายเดือนก่อน ⁺¹
It is enough. We do it with 2 cameras on a swivel. It can do it with the current camera setup
@MooseOnEarth หลายเดือนก่อน
Hw 3.0 without ultrasonics will not be enough. There are just too large blind spots around a stationary car. A human driver can step outside and look for distances or objects during a parking maneuver, or can look from outside before getting into a car and out of a parking space, but the fixed cameras in a car cannot.
The next problem is light: whenever you rely on cameras, you rely on external light. This will make its performance worse than with additional LIDAR or RADAR or ultrasonic sensors. At that point in time, the additional expenses for a few LIDAR and RADAR and ultrasonics sensors will let other manufacturers get their vehicles and software certified but not Tesla's from the past, as they cannot achieve the same performance in rain, darkness, snow or fog or with close objects.
@brianjohnson2650 หลายเดือนก่อน ⁺³
Well done! These topics are your specialty.
@rmkep หลายเดือนก่อน ⁺²
Thanks for the mind bending discussion. I wonder what percentage of the population has interest in grappling with these issues? Based upon current events, I think that small percentage is in decline.
@jmattoxriskpro หลายเดือนก่อน
Maybe so, but we are the few that will drive the future
@girowinters หลายเดือนก่อน
If the ones who are not paying attention vote against the climate...where does that leave you? Pretty dystopian future
@DanFrederiksen หลายเดือนก่อน
v12 does develop planning at any timescale and more than just hierarchical it can have multiple parallel conjoined plans at different time scales continuously being reformulated with virtuosity. So it's not a matter of categorically missing a time scale (depending a little on how long its memory is) but getting that function right. It can do plans at various timescales but it's more a case of getting that logic correct. It can have a horse in the race but that horse can look mighty deformed :) or look really close to a perfect horse but have a bum tendon that will make it fall in certain cases. The black box approach is architecture agnostic, in a sense capable of everything but the training data then has to take you all the way.
Yann is talking about unsupervised learning on video and then bridging to natural language later which is a pretty obvious approach and what's being done for still and video already. But getting that bridge to natural language very good has not worked for either of them yet. You can say what you want to midjourney but it will do what it wants anyway :)
I think I know the future of that, you have to be both iterative and add a spatial addition to language. Normal language is serial and very poor at spatial complexity so rather than a serial string you have both iterative improvement and spatially local language. Not just a long text and pray it turns out but you place description in the image or space. You can open with a meadow and say style and sun direction and say path over here and bird over there, hills in the background over here. At first it might be 2D but pretty soon it will have to be interactive 3D where you can move the perspective around, set up camera paths and animations. Spatial reality is unsuited for a 1 shot text description.
They might read this, implement it and forget to credit where it came from. If it's not a paper plagiarism is allowed :) Hinton didn't credit me when he wrote his confused paper about autoencoders in 2006 a couple of months after I pointed out their significance. I gather he is known as a copycat although they still gave him the turing award for deep learning :) neither of the 3 came up with DL.
@restonthewind หลายเดือนก่อน
Detour around the traffic jam in midtown Atlanta ...
@DC-yh2oy หลายเดือนก่อน
Traffic aware navigation should route around this when it can. When it can't, it will do what a human does and just crawl through it. The main difference is that it won't get frustrated and pissed off.
@kstaxman2 หลายเดือนก่อน
This is exactly what a baby is doing. Interacting with the environment and building a mental model of the world. And this model even ties into the distance and nature of things in the world as relates to how we move thru the world. They learn the length of their reach, how many steps to a close object, how a ball dropped acts over how a ball thrown, and that their hands and feet are part of themselves. Soon it can know that an object is just outside its reach and it must take a few steps to be able to pick it up. It's almost like we have two parts of our world model. One general and related to things close to us and one that is seeing the world in a larger context. The baby learns that what is outside and seen through a window is different than what is on the coffee table beside it and it could reach out and touch. It's this part of the machines model that is missing. And as you said it requires that the robot, car, or machine have embodied self awareness. Even with this new training I'm not sure that the self embodiedness would develop.
@andyfeimsternfei8408 หลายเดือนก่อน ⁺¹
I just want "hey Tesla" turn right, left or straight for when I don't have a navigation set.
@user-ls3zr5gk9h หลายเดือนก่อน
I live in the Boston area. I WISH there were only right, left and straight.
@andyfeimsternfei8408 หลายเดือนก่อน
@@user-ls3zr5gk9h It's better than nothing. Always having to enter a destination is a pain.
@aidendeem903 หลายเดือนก่อน ⁺¹
Words are an abstraction of an abstraction, twice removed from reality.
@audience2 หลายเดือนก่อน ⁺¹
Hierarchical planning is trying to impose functional programming into the NN. But if functional programming worked there wouldn't have been the need to move to end-to-end trained NNs to solve problems with it.
@JoeVirella หลายเดือนก่อน
Exactly. For his example, you would have to feed video of someone going from NY to Paris until it understands most of the possibilities. Obviously it would be super complex and would require much more data than FSD.
@christophberkholz5915 หลายเดือนก่อน
This is extremely exciting sh**😍
$@fractalelf7760$
@fractalelf7760 หลายเดือนก่อน
Good video, I feel like I get better grasp of how AI really works from this channel over many others.
@craighermle7727 หลายเดือนก่อน
Will multiple "autonomous" vehicles interwoven with human drivers produce the same results? Would that affect the "decisions" that any given autonomous vehicle would make? What happens if multiple versions of FSD are on the road, or for that matter, assume that 2 competing full-sell-driving cars are on the road in the same area? Does that mean that an autonomous vehicle running company "A"s architecture has to have prior "knowledge" of company "B"s architecture before A can react to B?
@at3941 หลายเดือนก่อน
I can’t believe that they wouldn’t have already thought about this, but it needs to be more than 30 seconds. There are many instances where it needs to be longer than that.
@Arseve119 หลายเดือนก่อน
Do LLMs have vision capabilty to meet each hierarchical plan? The AI can have each step created from a trained framework but randomness is such a huge task without vision to particularly adapt to each situation in the real world. Optimus robot is a huge other problem than just FSD then, that is just my humble opinion.
@davecorbin5088 หลายเดือนก่อน
Even if it can plan 20 seconds into the future, the situation is constantly changing and could change drastically. So, what's the point of the 20 second plan? Isn't it just a matter of not hitting anything including things that might be moving rapidly toward you?
@lurin971 หลายเดือนก่อน
Brilliamt explainatioin. A+ Seeing/predicting into the Future ! Merci
@medhurstt หลายเดือนก่อน
Planning is all about memory and feedback. From an AI perspective it seems relatively straight forward to me. First come up with a high level plan of getting to Paris and AI (eg ChatGPT) is capable of doing that today. Next feed the high level plan back in and expand step 1. Repeat this feedback with expanding step 1 until you have an actual actionable step. Take the step and remove it from the list but feed what is left back in. Repeat until you get to Paris.
IMO, where AI is lacking today is runtime feedback. Not everything can be pre-trained and planning cant be pre-trained, it has to be experienced in the context of the actions.
Having said that, in principle you can plan the whole trip to the n-th degree but its hypothetical and probably isn't what would really happen in practice.
@medhurstt หลายเดือนก่อน
To clarify to myself, I think its a mistake to think blanking portions of a video and having the AI successfully fill it in helps planning a route for a Tesla because its not possible to plan every route and the fact that a number of routes are possible doesn't help the Tesla actually perform a sensible action.
So I stand by my belief that what is missing in AI is runtime feedback, not better models per se.
@MooseOnEarth หลายเดือนก่อน ⁺¹
Just use numbers. 20 - 30 seconds at a millisecond resulution would mean planning 20,000 to 30,000 steps of 1 ms each in advance. With only a couple of choices and branches at every few steps, this space of possibilities explodes quickly.
At a walking speeed of 1 m/s (3.6 km/h, about 2 mph) the 20-30 seconds time frame would mean something in a distance of about 20-30 meters - easy to cover with local sensors, like human eyes or cameras. At a speed of 10 m/s (36 km/h, about 22 mph): in a distance of about 200-300 m. At a speed of 30 m/s (108 km/h, about 67 mph): something in a distance of 600-900 meters. There is no way, any resonable sensors of the car itself (cameras, RADAR, LIDAR) can reach that far. Only in the most simple evironments (like a straight highway, no traffic) this may provide meaningful input.
So knowledge or assumptions for this region / time frame may only come from maps or come from other road objects' sensors (car to car communication, or car to infrastructure communication) or from online services (such as remote stationary cameras or from flying drones as the idea of Frank Rinderknecht was with Rinspeed Etos in 2015, a robocar plus a drone). Or from machine-generated predictons, but then these are predictions with less probability of actually being valid. How do humans do it? With predictions, experience and their own hierarchical planning. Or they make mistakes and fail as well as in unknown environments, cities, other countries and so on. So, whenever the systems are trained, then it must be the input of *competent* drivers.
@marcboucher913 หลายเดือนก่อน
This task time plan training method is the right way to go about to get the Ai to advance from freeze state (not nowhing what to do) to a more dynamic state of representation modeling. It will improve the time lapse planning to do complex task in a more efficient way by a large percentage. To think in categories representation is a more efficient way to go about the world for us humans. The engineers should read Piaget work on how kids construct the world representational time and space concepts.
@52088 หลายเดือนก่อน
找出問題持續解決問題更新，未來只會越來愈好。加油。請相信馬斯克.
@RussInGA หลายเดือนก่อน
overthinking... lots of those heiracheral layers are not needed from the car. For a whole trip yes, but a lot of them can be handled easily by standard methods.
@erikbarsingerhorn4485 หลายเดือนก่อน
''No body knows how to do this with AI''. The question is what do they know at Tesla? Everything they do is groundbreaking, new and revolutionary.
@davidx.1504 หลายเดือนก่อน ⁺¹
I think if elon saw this video as an X post, he might consider Implementing it...
@spyral00 21 วันที่ผ่านมา
Who would have guessed that Meta was going to be the good guys.
@briandoe5746 หลายเดือนก่อน ⁺¹
So I'm not an expert but I'm pretty sure you're wrong. Reactive millisecond by millisecond planning is how the system works right now.
It has a pre -decided path and it reacts to things it encounters. This is how embodied AI functions currently. Actively deciding these things is one of the cool things about Tesla full self-driving and what in Nvidia is working on.
@sapienspace8814 หลายเดือนก่อน ⁺²
What you are describing might be solved by a separately trained, specialized RL agent, specifically trained to optimize in that "mid term planning (hierarchical) problem".
@jmattoxriskpro หลายเดือนก่อน
This limitation is discussed as a problem for LLMs. This is not a problem that v12 shares with LLMs.
@heltok หลายเดือนก่อน
Content relevant for Tesla starts 34:55. But it's kind of silly, if this hierarchical planning is useful for predicting how good drivers will drive in the near future then the neural network will learn to do it... Maybe need a bit more data and compute for it to get there, but it will eventually learn to do it.
@shanelahousse3344 หลายเดือนก่อน
I have always been puzzled how Tesla’s approach seemingly avoids any local knowledge that can be used for open loop planning.
@user-ce5ju1mi1x หลายเดือนก่อน
Poach LeCun into TESLA
@lym3204 หลายเดือนก่อน ⁺¹
A car is also an embodied AI. The body does not have to be humanoid.
@trevorschaben1815 หลายเดือนก่อน ⁺¹
If FSD sgets to a point that it can start to learn from itself, this will happen quickly.
@LegendaryInfortainment หลายเดือนก่อน
Planning as a series of steps is like a haunting of C coding. Tesla will find the way to avoid something like this regression.
@NinkSink 22 วันที่ผ่านมา
One of my big worries with so-called AI especially in the realm of driving and centralized networks is this. What happens when the neural network is wrong. Or has an error. Or is down. Or cannot be communicated with. In the case of a wide spread glitch that means all the cars are going to absorb the glitch, especially if the glitch is a progressive glitch. Meaning, the problem slowly evolves into a gigantic error. And how Tesla data centers and cars react to the issue is going to be paramount.
Human drivers generally move in herds. But each individual driver is capable of recognizing when the herd is wrong. Sometimes there’s not enough time to recognize that the herd is wrong and that’s where you get multiple car pileups. But that multiple car pileup is localized. The error only occurs in that section. It doesn’t ripple through every road and car in the United States. But with a centralized neural network, if the error is ubiquitous, evolves overtime and doesn’t get detected until catastrophe, then every single car in that neural network is going to be in error.
Another way to look at it is centralized governance in military doctrine versus decentralized governance in military doctrine. Centralized has its strengths until a unit is caught in a situation that is catastrophic and it is not allowed to adjust or get its self out of and then if the catastrophe is large enough in that section of the Front, then a very large section of your military is going to get destroyed by the enemy taking advantage of the centralized command structure’s near term failure.
@williampelzer1460 หลายเดือนก่อน
This is where sub-models will need to come in. We have a conscious mind and a subconscious mind which deals with the autonomous nervous system. We'd give the planner the broad strokes which it will delegate to the appropriate sub-model for execution. The sub-models will have the knowledge of their domain and how to accomplish their task. This way the planner doesn't need to know the minute detail involved in each step of the process. It deals with the over arching goal and how this will be met within the parameters given. It's software development 101.
So in FSD terms we'd have the route planner which will coordinate how the path planner determines what tasks the driving model (physical equivalent or kinematics model) will execute. The path planner stores a set of tasks to be accomplished which it updates based on whether the previous tasks can be or have been executed. This is just a conceptual understanding but these would be LLMs anyway so they'd do all of this according to a training set...
@aniniels-hw5iv หลายเดือนก่อน ⁺²¹
How does this trading stuff work? I'm really interested but I just don't know how it go about it. I heard people really make it huge trading..
@wells7147 หลายเดือนก่อน
As a beginner, it's essential for you to have a mentor to keep you accountable. I'm guided by Fergus waylen a widely known crypto consultant :
@Melbn-di6mi หลายเดือนก่อน
Riding the market wave entails understanding the market volatility, Fergus Waylen has been doing an awesome job reviewing the charts, enabling me to capitalize on the volatility via day trading.
@wilsonrichard440 หลายเดือนก่อน
I am surprised that this name is being mentioned here, I stumbled upon some of his clients testimonies on CNBC news last week..
@charles2395 หลายเดือนก่อน
What impresses me most about Fergus Waylen is how well he explains basic concept of winning before actually letting you use his trade signals. This goes a long way to ensure winning trades.
@HLO-iy2bp หลายเดือนก่อน
His stop calls have saved me from potential losses more times than I can count. Waylen's risk management strategies are truly effective
@roger_is_red หลายเดือนก่อน ⁺¹
Then how did Waymo solve this problem??
@Big_Ben_from_La_Mesa หลายเดือนก่อน
Waymo has documented far, far more reportable accidents per car to NHTSA under the SGO than Tesla has, so I'm not sure Waymo is the best model.
@roger_is_red หลายเดือนก่อน
@@Big_Ben_from_La_Mesa Maybe not. But its unclear to me why mid term planning is required it seems to me the Tesla needs to follow the map and obey traffic it is not like flying from NYU to Paris and that is required to figure that out.
@roxter299roxter7 หลายเดือนก่อน ⁺¹¹
Didn’t FSD solve these problems when they got rid of heuristics? In my view the real problem is that the car does not see far enough. Humans can see about 500 meters, on a good day, and plan their actions accordingly.
@mariusm62 หลายเดือนก่อน ⁺³
Also humans have a memory, allowing them them to plan for maneuvers well ahead of time.
et's face it, Tesla's FSD needs some form of HD Maps. They don't need to be centimeter level maps, just a basic layout of the intersections, as well as other signage.They could be even compiled by the Tesla cars that pass though an intersection.
@malax4013 หลายเดือนก่อน ⁺⁴
HW4 definitely have good enough cameras but HW3 is questionable. But what I have observed in FSD drives the mistakes it makes almost never seem to ne about vision or interpreting the world. Planing and action is what causes the problems and those can be solved via software.
@darwinboor1300 หลายเดือนก่อน
HW5 will have frontend side view cameras. Probably in the headlamp assemblies.
@roxter299roxter7 หลายเดือนก่อน ⁺¹
@@darwinboor1300
How do you know this?
@roxter299roxter7 หลายเดือนก่อน ⁺¹
@@malax4013
When you go for a drive take note of how far down the road you look in order to plan your path, then look at how far the cameras can see/visualize down the road. If I’m driving I can see a slow dump truck down the road before the car can.
@anarkali_217 หลายเดือนก่อน
What u get wrong is that tesla does have data on how much speed , turn n other factors from its fleet. To go down to that level. As long as its in the distribution. Transformers generalize well. And tesla if they can crunch the massive amount of data will be able to do so . I dont think u got lecuns point here.
@angelafox7627 หลายเดือนก่อน
I like your videos. You provide good content and excellent detail. However, you should either listen to yourself or read your transcript. You repeat the same details over. You could have presented this excellent material in 20 minutes max. This is TH-cam, not a lecture theatre, we come to learn, and we can replay what we miss. I think you might get far more views if you study your presentations and edit them. But thanks for the material.
@techyjames1945 หลายเดือนก่อน
I think your thinking about 30 seconds for staging is still way behind where you should be. Your driving 60 mph, and your in left lane, but traffic for exit lane is backed up a mile and half AKA 90 Seconds. That is bad behavior and not getting in lane behind the mile and half traffic making you either A cut someone off or slam on brakes in next lane creating a traffic obstruction. Your analogy is a prime example of trying to get in lane just 1/2 mile before exit is bad staging where you need to be. At this point any attempt to get in is creating additional traffic obstruction or reroute to a future exit further down road, this time getting over sooner.
@robinheider414 หลายเดือนก่อน
Fsd doesnt need to predict the whole scenario of going from a to b before it goes. It needs to be able to go way point to way point and then react to stimuli on the way that impedes it from going a to b. Are you not overthinking this whole scenario. ? I dont plan for every eventuality when i need to run an errand i react to stimuli that prevents me from concluding my errand. This video doesnt seem to be very focused or it should be re titled.
@ClayBellBrews หลายเดือนก่อน
So I agree. This goes back to the embodied problem. If an AI doesn’t have a physical body AND things like eyes, ears and skin; they can’t really get to AGI
@AIVision_Era หลายเดือนก่อน
For nvida and other company to realize physical world AI, the bottle neck will be data with context, there is no way to skip except to take daily dream
@jasonissertell หลายเดือนก่อน
These embodied AI topics are becoming more and more like the process and experience of parenting a human child and training their human neural net.
Today It’s a few million humans teaching highly specialized AI systems that someday will find synergies and coalesce into AGI.
Fascinating

ต่อไป

เล่นอัตโนมัติ

Tesla’s Optimus Will Completely DOMINATE