00:00 - Introduction to Spatial AI and Fei-Fei Li 00:35 - Fei-Fei Li's Contributions: ImageNet and AI Research 01:43 - Cambrian Explosion in AI: Pixels, Videos, and Language 02:09 - AI's Evolution Through Deep Learning and Data 03:12 - Early Days of Deep Learning: ImageNet and Commercial Applications 03:54 - Rise of Generative and Discriminative Models in AI 04:20 - Algorithmic Advances in Computer Vision and Language Modeling 05:16 - AI’s Power Unlocked by Large Data and Compute 06:14 - Key AI Papers: Attention and Transformers 07:28 - Breakthroughs in AI with AlexNet and GPU Power 09:57 - Supervised Learning Era and AI Data Utilization 11:22 - AI Research and Algorithmic Breakthroughs in Academia 13:54 - Evolution of Generative AI: Style Transfer and Image Generation 15:10 - Speed and Optimization in Generating AI Images 16:36 - Gradual Advancement of AI Towards AGI 18:05 - Fei-Fei Li’s North Star: Visual Intelligence and Storytelling 19:46 - Current AI Capabilities: Computing Power and Algorithm Depth 21:13 - Tesla’s Use of Real-World Data for AI Training 23:08 - Transitioning from 2D to 3D: Learning Structures in AI 24:27 - Nerf’s Breakthrough in 3D Computer Vision 25:33 - AI's Focus on Reconstruction and Generation Convergence 27:04 - AI Representation of the 3D World Through Physics and Structure 28:12 - Spatial Reasoning and Limitations of Multimodal AI Models 29:14 - Contrast Between 2D and 3D World Understanding in AI 30:12 - Processing and Representing 3D World Data in AI 31:35 - Human Perception of 3D World Through 2D Visuals 32:57 - Future Applications of AI in Virtual 3D World Creation 34:19 - 3D World Creation and its Economic Impact in Gaming 35:46 - Desire to Explore and Simulate 3D Virtual Worlds 36:58 - Impact of Spatial Intelligence on AR and VR Technologies 38:00 - Spatial Intelligence as the Operating System of 3D Future 39:02 - Ideal Hardware for Spatial AI: Glasses vs. Goggles 39:33 - Blending Digital and Physical Worlds: AR and Robots 40:06 - Conclusion: Spatial Intelligence’s Role in Robotics and AI
How about a video series teaching us how to put our Nvidia gaming GPUs to use for AI. I know you already have tutorials on running llms locally, but I keep finding out about new features and use cases with my RTX4080, and I can't find a TH-cam channel that really focuses on getting the most out of these GPUs for AI. Chat with RTX is pretty cool for beginner's. I've moved on to OpenWebUI and ComfyUI for my llms and diffusion models. Currently trying to figure out ComfyUI properly, isntead of just using other peoples workflows and dropping my own loras in. I want to learn how to use 2-3 loras. 1 lora for creating images of myself, one for a certain style of image, and one to improve photo realism, skin tone, etc.
I watched this interview on the a16z channel. it is so amazing how much AI is making actual progress even among all the over-bloated market economics taking place right now into making anything and everything AI which is making many people wary. Well, if the internet and AI took away your critical thinking skills, AI and internet isn't responsible for it. I'm so sick of channels interviewing "CS researchers and scientists" and they spend like an hour decrying everything AI. Folks, use you own goddamned brains a little!
So many incredible use cases for this technology. People creating their own ‘happy places” they can just put on the headset and submerge into while on a long flight.
I've done a few tiny experiments just prompting an LLM having it think of real world descriptions in terms of spatial coordinates of objects. It seemed to be a little better at solving real world 'puzzles', like what happens to a ball in a cup that gets turned upside down. Not perfect - but it did uncover some of the model's vague understanding of what a 'cup' is, for example.
Fascinating. I would point out though that ADN condons are "words" in nature (not floating in the sky, but embedded in every living creature) and that they convey 3D intelligence.
Either ask for permission to play somebody elses content, or take your chances to get a copyright strike - but fast forwarding to fool youtubes algorithm is stupid.
It's a stretch of fair use when 90% of your video is someone else's video. I'd be a lot more inclined to watch your videos if you provided summarization and properly clipped and edited the original video.
It seems that the practical applications of these developments are still in the process of being fully realized. I hope they are finding ways to apply these innovations beyond just gaming. With that in mind, here are a few possible areas where they might have significant impact: 1. Generating a comprehensive set of architectural and engineering designs based on site parameters and design preferences. 2. Creating 3D product designs, such as furniture or wearable technology, that adapt to environmental factors and surroundings. 3. Offering emergency assistance through augmented reality, such as using smart goggles to guide someone through landing a plane in a critical situation. 4. Enabling underwater robotic welding to facilitate complex repairs in challenging environments. 5. Utilizing autonomous drones that can navigate hostile environments and selectively target designated individuals. It might sound harsh, but it’s likely similar technology to what they would be using for shooting games.
The pace of advancement in artificial intelligence has been staggering. Recent breakthroughs have eclipsed years of prior research and development. New capabilities are emerging across diverse fields, showcasing the technology's transformative potential. Experts and analysts alike are struggling to keep up with the rapid rate of progress. The future landscape of numerous industries appears poised for a dramatic shift.
Spatial AI is a thing for sure. Our world is dimensional. If an AI wants to understand everything that is happening in our world, it has to learn about all the dimensions and the physical connections. You said that, for example, Tesla has a lot of spatial data. That is correct for sure, but for quite some time we collect spatial information with our mobile devices. Think back to 2014/2015 when google showed project "tango" (the mobile AR that became ARCore in the end). What they wanted is to capture all the insides of the buildings they capture from outside with Google street maps. And we should not forget that also audio is spatial. You can capture a lot of data just by analyzing audio data (submarines do this for a long time). I am very excited about the spatial AI approach and i hope it will be available for everyone (open source). Thanks for the video, Matthew.
The issue for AI is having no environment therefore causing inability to access or navigate beyond calling apis. Ifigured the trick is to put a remote dpi into an app so u can display a computer environment in a window on your own desktop, then add a node interface over that and train ai on navigating the node system It works because each node is a sensor and informs the back end of pixel colour directly underneath each node in realtime and also functions like mouse clicks and keyboard inputs so the view of nodes and functions occur together, and train ai on navigating the node system with machine learning beginning with just training on a user just using the computer themselves. The ai never has to see the display because the nodes give all the info so if it can learn to navigate the nodes then it can navigate the computer without any direct connection which means jack all overhead. Not sure if people realise this but even us humans don't see the world, we see a simulation informed by signals from the senses, one type being rods and cones in our eyeballs, this node system would be the rods and cones interface that by correlating with action allows ai to link action potential with view of a digital environment. The coolest thing about this is the ai would be in an app on another computer so it could navigate the internet in the same way humans do to find the info it needs to do tasks and could build seperate apps, rebuild new versions of itself on another computer using other ai like Pythagora and test its apps by actually opening them and trying them, it could set up a system where it remotely navigates multiple remote computers at a time and trains on all of them, ai building ai, automation automating automation and automating iterations out of better interactions with manual input becoming less and less in each one The crux of this is giving ai an environment. People have tried with robots and virtual environments but why not the digital space itself, that's the first thing we need to do
I feel like our ancient ancestors - pre-language - had a non-communicated version of language that language sits on top of but doesn't necessarily describe, even with huge language sets. We can hard code physics and things of that nature to almost literally ground AI in the real world, with being able to probe its answers for a quite material-based explanation for why it is stating something. One of the most captivating use cases for VR is just exploring and screwing around in photoreal 3d environments, "telling stories / pretending" BUT there is a huge lack of these at low price that are quality - I have to dig through about 80% detritus for the good stuff. If this suddenly wasn't the case, it would mean endless novel worlds to explore, personalized, responsive to speech, hand motions, etc. This drives mass adoption of VR, lowering the cost of high res, high FOV headsets that are comfortable, which in turn drives forward spatial AI
Regarding about the image generation being more like a continuum from previous milestones but was regarded by the public like an abrupt new thing, it made me think of the quote: "It takes 10 years to become an overnight success."
I don't understand what World Labs is up to. The concept of modeling the world and reasoning about objects, places, and interactions in 3D space and time closely relates to topology, which is the mathematical study of how spatial relationships and properties persist or change under continuous transformations, such as stretching or bending without breaking. So, they are either giving a new name to the study of topology, or they don't fully grasp what math has already covered. Additionally, using mathematical reasoning and logic, combined with Multiphysics Simulations, produces more accurate results than relying on probabilistic token-based models for representing objects.
"Additionally, using mathematical reasoning and logic, combined with Multiphysics Simulations, produces more accurate results than relying on probabilistic token-based models for representing objects." I'm not sure how useful this approach would be when it comes to interacting with the real world, which is so complex and unpredictable. It wouldn't be sufficient to control a robot or autonomous vehicle. Certainly, a simulated environment is useful for training deep learning models but they'll still need training.
OpenAI still having a context length of 128K is just a bummer. If they won't increase that, I can see them losing the high-ground regarding the reasoning models. Google's Gemini has a Context length in the range of MILLIONS and this could give them the upper hand, since reasoning models essentially use up that finite context length pretty quickly. Of course, summarizing the thought process and feeding back only the summary can reduce the amount of tokens needed, or simply making the AI think in a more efficient language that humans might not be able to understand (at the expense of transparency) could also help, but ultimately, they'll need to increase that context length. It will get ugly for them if they won't.
Yes. I wanted to use Gemini because of the context length. But something with this model is off. You can't use it for coding. It wants to teach you or something. But I want it to generate the whole code. 😂
It's kind of interesting that in this whole conversation they didn't really mention the most valuable outcomes from Spatial AI - Engineering, industrial design, and manufacturing, which are rooted in 3D parametric models. These are the topics that will yield the AGI abundance everyone is hoping for. It takes machines to make stuff, and the machines will be designed and built orders of magnitude faster when the design cycle happens in seconds.
Going from one 2d image to 3d is quite hard. But going from video to 3d is much easier because there is much more depth information in how closer objects move differently relative to objects further away.
I would be interested to know what the training data would be - 3D scans, dual camera or possibly control net data with depth maps and positioning info. The real challenge is as she states, a human can look at an infinite number of variations in the world and know what it is instantly (a chair for example, regardless of shape or build complexity we know is a "chair" even if we've seen one for the first time in our life, we just know it's a "chair" - for AI in 3D space this would be a challenge because it has no text cues, and manual human tagging for every object on earth is practically impossible). Tesla is limited to road data, it has no concept of objects invisible to it outside of the narrow depths of field from roads.
I hope they use all kinds of sensors when they build their data set, I like the idea of AI being able to estimate the temperature of things based on an image.
whoever can crack the cheap and fast apprehension of high quality 3d structure of the real world will provide a huge advantage to many fields, this has been a huge and expensive struggle for decades now, making design, engineering and games so much more expensive.. An "unlock" like this will provide training data for calculating 3d from the billions of 2d photos, videos. Then this links up to LLMs and makes them more precise as well. Once you have a detailed and precise understanding of a 3d environment there is TONS of "free" functionality - simulate new lighting, calculate volumes, path-finding for robots, physics, etc And these tasks can be done in non-real time as needed. This leads to AI-designed automation solutions acting on the real world, deploying drones, IoT etc which then completes the loop so we get a real-time sim of the real world with fidelity proportionate to locations of interest.
very interesting topic, as I work in virtual construction, obviously it goes way beyond this but just reaffirms my opinion that ppl won't be designing and coordinating much for construction for much longer.
To go with Spatial AI, I hope they also combine it with a physical structure that not only morphs under and around you but also visually represents the Ai world from our own ocular perspective. If that can be invented in my lifetime then I will be rather pleased. I have always dreamt of having my own Holo-Deck.
It is interesting the comment at around 29:00. Transformers, attention, LSTM, are linear. CNN's that have worked well for images, can be 2D for images, or 3D for point clouds. Those networks seem to represent well the 3D world. But there could be a better architecture to represent 3D data, objects, physical concepts etc.
This is the future in my opinion. I see this modaility taking over the 1D, 2D image modality in the near future. It'll give much more details in color and nuances to the LLM to process. It might be a little more computionally expensive but the benefits will outeeigh the risks and "costs" I think. The three modalities LLM has now will reach their thresholds at some point. It'll need to lean on more advanced technologies like this one too process more nuanced representations from the 3D or the real world. Another one is video reading. Current LLMs can't do that. It reads transcripts only not the visuals. With the camera shutter mode it's possible for this tech to read videos in 3D. Those will enrich the LLM in its representation algorithm. With a richer context, it'll reduce case of hallucinations even more. Anyways, this is exciting stuffs if you'd ask me.
Somehow we're still seeing LLMs improving a lot (say, twice a year, e.g. with 4o, 1o) with overall the same amount of data, perhaps they're already incorporating more than 1D/2D in there..
This might be exactly what you need: 1. Learn about spatial AI and its potential impact on various industries. 2. Explore resources like the a16z Podcast with Fei-Fei Li and platforms like Mammoth AI to deepen your understanding. 3. Consider the ways spatial AI can enhance robotics, augmented reality, and world generation in the near future.
Meta's SceneScript is already capable of reconstructing environments and representing the layout of physical spaces. So, the question remains: what truly innovative or new approach is World Labs bringing to the table? And finally, there’s a need to reinvent Meta's Orion AR glasses and the advancements from Boston Dynamics.
2 หลายเดือนก่อน
I already did these experiments and development in the 90's using REAL PEOPLE, as a cash Math/Physics tutor (now over 80,000 hrs). Guaranteed clients an A in Engineering Physics 1, by reverse-engineering an expert system from the Curriculum, and employing "Lazy Learning" techniques, employing the HUMAN BRAIN as my neural network. I extended the concepts to the laboratory teaching, and my class became so popular, 3 hours per week, NO HOMEWORK, that the University changed a by law to allow withdrawing from the lecture class but keeping my class. Not only did "compute" not cost me anything, people paid ME to do my compute!. Also mastered transition to Algebra 1 fluency. No one wanted it, the schools just being totally corrupted with mediocrity. So now I tutor the smart, well to do, immigrant crowd and enjoy watching the world catch up.
I noted Justin Johnson spoke about 4D... to me, that speaks to AI dealing in a holistic view of the world, not only the basis of 3D models, but how those 3D models live in their placement within the world. But consider also, over an object's relative position in a given time in space. But I could be wrong... but that seems to be some far out Einstein Relativity thought here. What do you think?
🎯 Key points for quick navigation: 🌐 Spatial AI integrates AI with spatial data for enhanced contextual understanding. 🏙️ It enables AI systems to perceive and interact with physical spaces in real-time. 🚀 Spatial AI is considered the next frontier in AI architecture due to its applications in robotics and autonomous vehicles. 🌍 It revolutionizes industries by optimizing logistics, urban planning, and environmental monitoring. 📡 The technology enhances augmented reality experiences by overlaying digital information onto physical environments. 🛰️ Companies are investing heavily in Spatial AI for its potential to redefine navigation and mapping technologies. 🏗️ Challenges include privacy concerns and ethical implications regarding data usage in spatial mapping. 🌌 Future advancements in Spatial AI aim to create seamless interactions between digital and physical worlds. Made with HARPA AI
Another insightful video Matt. My first thought (as a Treky) Wow soon we could have holodecks' . On a more realistic note: resolution; perception; scale are factors that will become important with this technology. While as humans we would want the highest degree in all of these factors. Will related AI require the same levels of these same factors to percieve what we as humans do within a 3D space? I think not! In essence would spatial intellegence see our world better than we do? If so what are the potential benefits and implictions?
This stream of mind-blowing news just gets crazier every day... if it continues like this, you will have to switch to live broadcasting. Your insights and comments are very valuable.
Wearing a pony tail makes you look smarter. Just like having a british accent. I forgot, if you talk fast you are by definition a genious - so much to share with us mere mortals...
What if architecture of a model (weights interpreter) should be also dynamic and trainable via deep learning or via selection/mutation Darwin-style mechanic
fei fei, ilya sutskever, and alex started the ai deep learning revolution with alexnet back in 2012, they're so underrated ... to this day deep learning continues to conquer field after field, amazing
Matt, I keep telling you that the system of FSD and Optimus data and training related AIs has a jump on the rest of the world. I feel that today that collection of AIs has the best chance to reach AGI and ASI. A chance that just skyrocketed now that Tesla AI is no longer compute restrained. Too little attention is being paid to this segment of Tesla AI.
Contact lenses with AR combined with neuronal output device on your ear with mic and bone induction sound ..(since brain input is still theoreticall - and risky when it will be possible ) You would have AR and besically think what you want and it appears in AR.... ex:call somone ..time...video....see the collusem in real life and overlay as it was with gladiators and all, games like pokemon go but actually see the pokemon (Now that i think about it we might have to invent a few new names for mental problems that people will have😅)
imagin once he humanoid robots do actions and the data can be analysed for VR games. all the interactions you could simulate and therefore have within videogames. that would be awesome.
You can do all that video games do by learning to leave/phase out of the body. Jurgen Ziewe has a new video,Wild, based on his experiences. Neville Goddard has a video called, Out of this World, giving detailed instructions. Someone narrated a 6 part series on Frank Kepples techniques. I can verify some of what they experienced. AI is just another attempt to go back to where we were before birth. It is like the call of the salmon who knows where its home is. Perhaps AI could be used to find the most efficient way to phase out to the infinite worlds beyond so everyone can do this.
Yes. If u feed the LLM with them in the video. Shut caption off, make sure the video is clean or represented properly before input. That will reduce case of hallucination.
Well... very informative and cool and I can see it being applied in many industries, however it feels like there is something missing. Oh yes where on earth are you going to store all that spacial data ? The next big thing would be then Storage! Not compute...
I cant help but chuckle everytime people say about new paradigm of reasoning and chain of thought… I put together a group chat for gpt3 to discuss my prompt and witnessed the power of it a few days after chatgpt release… Am i genius or are these ‘researchers’ dumb?😅 I dont get it guys, are they slowly milking the cow that should have been a phrnomenal steak right away?
If we're crafting AIs in our own image, we shouldn't expect them to think outside the box we've built. Matt you should stop erpeating "real world" and wipe off that overzealous grin - this is a funeral my man. What we're teaching AI to perceive is the world we live in. Even the visual representation of local reality at the visible spectrum will be rendered differently by beings with a different central nervous system. It is okay to limit AI to human cognitive/perceptive frameworks, but don't then complain that it can't provide novel insights. This is another nail into the coffin of ever having a truly foreign intelligence, another brushstroke on the frame of the mirror we like to admire ourselves in.
if machine goes on higher sensory level (how many dimensions they theoretically counted already? like 11 above us?) - we will stand where we are stuck forever, it makes the feeling like birds leaving the nest to the winter migration. I watched this summer storks nest full growing, yes only one chick survived from 4, but it was personal sadness when it flies away and maybe never returns, if even survives in Africa at all. Humans need to transform into digital entity, because every mammal are made from a cell, from foreign microbial world by nature's help we transformed into bigger creatures/sizes world which kinda goes nowhere, it will be natural to leave this world not by collapsing Dna in every cell, moving next level.
They said .. plumber cannot be replaced... with spatial intelligence you can be plumber! with that device it can project AR Of pipes behind the wall because spatial intelligence able to generate 3d!, mechanic?
00:00 - Introduction to Spatial AI and Fei-Fei Li
00:35 - Fei-Fei Li's Contributions: ImageNet and AI Research
01:43 - Cambrian Explosion in AI: Pixels, Videos, and Language
02:09 - AI's Evolution Through Deep Learning and Data
03:12 - Early Days of Deep Learning: ImageNet and Commercial Applications
03:54 - Rise of Generative and Discriminative Models in AI
04:20 - Algorithmic Advances in Computer Vision and Language Modeling
05:16 - AI’s Power Unlocked by Large Data and Compute
06:14 - Key AI Papers: Attention and Transformers
07:28 - Breakthroughs in AI with AlexNet and GPU Power
09:57 - Supervised Learning Era and AI Data Utilization
11:22 - AI Research and Algorithmic Breakthroughs in Academia
13:54 - Evolution of Generative AI: Style Transfer and Image Generation
15:10 - Speed and Optimization in Generating AI Images
16:36 - Gradual Advancement of AI Towards AGI
18:05 - Fei-Fei Li’s North Star: Visual Intelligence and Storytelling
19:46 - Current AI Capabilities: Computing Power and Algorithm Depth
21:13 - Tesla’s Use of Real-World Data for AI Training
23:08 - Transitioning from 2D to 3D: Learning Structures in AI
24:27 - Nerf’s Breakthrough in 3D Computer Vision
25:33 - AI's Focus on Reconstruction and Generation Convergence
27:04 - AI Representation of the 3D World Through Physics and Structure
28:12 - Spatial Reasoning and Limitations of Multimodal AI Models
29:14 - Contrast Between 2D and 3D World Understanding in AI
30:12 - Processing and Representing 3D World Data in AI
31:35 - Human Perception of 3D World Through 2D Visuals
32:57 - Future Applications of AI in Virtual 3D World Creation
34:19 - 3D World Creation and its Economic Impact in Gaming
35:46 - Desire to Explore and Simulate 3D Virtual Worlds
36:58 - Impact of Spatial Intelligence on AR and VR Technologies
38:00 - Spatial Intelligence as the Operating System of 3D Future
39:02 - Ideal Hardware for Spatial AI: Glasses vs. Goggles
39:33 - Blending Digital and Physical Worlds: AR and Robots
40:06 - Conclusion: Spatial Intelligence’s Role in Robotics and AI
Matt playing that video at 1.5x while I am playing this video at 1.5x
I'm always watching 2x speed. And god, I want 2.5x now. Love it!
@@bestemusikken well technically its 3x
@@bestemusikken Just open the Developer tools and type: $("video").playbackRate = 3.0;
You can set the playback to whatever multiplier you want.
me too !! .. (but I watched the SpatialAI before...)
@@bestemusikken with Enhancer for TH-cam extension you can set speed from 0.1 to 16x(sic!)
Please don't speed up the video. We can speed it up if we want to.
Try it at 0.75 speed, that's the original speed.
How about a video series teaching us how to put our Nvidia gaming GPUs to use for AI. I know you already have tutorials on running llms locally, but I keep finding out about new features and use cases with my RTX4080, and I can't find a TH-cam channel that really focuses on getting the most out of these GPUs for AI. Chat with RTX is pretty cool for beginner's. I've moved on to OpenWebUI and ComfyUI for my llms and diffusion models. Currently trying to figure out ComfyUI properly, isntead of just using other peoples workflows and dropping my own loras in. I want to learn how to use 2-3 loras. 1 lora for creating images of myself, one for a certain style of image, and one to improve photo realism, skin tone, etc.
I watched this interview on the a16z channel. it is so amazing how much AI is making actual progress even among all the over-bloated market economics taking place right now into making anything and everything AI which is making many people wary. Well, if the internet and AI took away your critical thinking skills, AI and internet isn't responsible for it. I'm so sick of channels interviewing "CS researchers and scientists" and they spend like an hour decrying everything AI. Folks, use you own goddamned brains a little!
Sorry nobody has translated the bike into brainrot yet
Didn't feel like their call brought anything new to the table at all.
you cant watch technical interview at highh speed
I slowed it to .75% to have a better time with the Video, unfortunately, that made Matt sound much less smart that I think he is ;-)
no you cant
You can. Not when the video already was 125% 😂
@@ChrisKruger75 lol, I just slowed it down because I had fast playback. Now he sounds like a drunk person trying to recap something we are watching.
what u yappping for ?
Tack!
So many incredible use cases for this technology. People creating their own ‘happy places” they can just put on the headset and submerge into while on a long flight.
fei fei is great! she have the pioneer's dna
This 40 min long video was worth every sec of it. Thanks Matt.
I've done a few tiny experiments just prompting an LLM having it think of real world descriptions in terms of spatial coordinates of objects.
It seemed to be a little better at solving real world 'puzzles', like what happens to a ball in a cup that gets turned upside down.
Not perfect - but it did uncover some of the model's vague understanding of what a 'cup' is, for example.
Fascinating. I would point out though that ADN condons are "words" in nature (not floating in the sky, but embedded in every living creature) and that they convey 3D intelligence.
Did you forget to link the A16z vid mate? Cant see it in the desc.
Updated
Either ask for permission to play somebody elses content, or take your chances to get a copyright strike - but fast forwarding to fool youtubes algorithm is stupid.
@@SarahKchannelI didn’t fast forward to trick the algorithm. This is clearly fair use since I give my commentary on the vid.
It's a stretch of fair use when 90% of your video is someone else's video. I'd be a lot more inclined to watch your videos if you provided summarization and properly clipped and edited the original video.
It seems that the practical applications of these developments are still in the process of being fully realized. I hope they are finding ways to apply these innovations beyond just gaming. With that in mind, here are a few possible areas where they might have significant impact:
1. Generating a comprehensive set of architectural and engineering designs based on site parameters and design preferences.
2. Creating 3D product designs, such as furniture or wearable technology, that adapt to environmental factors and surroundings.
3. Offering emergency assistance through augmented reality, such as using smart goggles to guide someone through landing a plane in a critical situation.
4. Enabling underwater robotic welding to facilitate complex repairs in challenging environments.
5. Utilizing autonomous drones that can navigate hostile environments and selectively target designated individuals. It might sound harsh, but it’s likely similar technology to what they would be using for shooting games.
The pace of advancement in artificial intelligence has been staggering. Recent breakthroughs have eclipsed years of prior research and development. New capabilities are emerging across diverse fields, showcasing the technology's transformative potential. Experts and analysts alike are struggling to keep up with the rapid rate of progress. The future landscape of numerous industries appears poised for a dramatic shift.
Spatial AI is a thing for sure. Our world is dimensional. If an AI wants to understand everything that is happening in our world, it has to learn about all the dimensions and the physical connections. You said that, for example, Tesla has a lot of spatial data. That is correct for sure, but for quite some time we collect spatial information with our mobile devices. Think back to 2014/2015 when google showed project "tango" (the mobile AR that became ARCore in the end). What they wanted is to capture all the insides of the buildings they capture from outside with Google street maps. And we should not forget that also audio is spatial. You can capture a lot of data just by analyzing audio data (submarines do this for a long time). I am very excited about the spatial AI approach and i hope it will be available for everyone (open source). Thanks for the video, Matthew.
What's the URL to "Archive" that is mentioned in the video?
The issue for AI is having no environment therefore causing inability to access or navigate beyond calling apis.
Ifigured the trick is to put a remote dpi into an app so u can display a computer environment in a window on your own desktop, then add a node interface over that and train ai on navigating the node system
It works because each node is a sensor and informs the back end of pixel colour directly underneath each node in realtime and also functions like mouse clicks and keyboard inputs so the view of nodes and functions occur together, and train ai on navigating the node system with machine learning beginning with just training on a user just using the computer themselves. The ai never has to see the display because the nodes give all the info so if it can learn to navigate the nodes then it can navigate the computer without any direct connection which means jack all overhead.
Not sure if people realise this but even us humans don't see the world, we see a simulation informed by signals from the senses, one type being rods and cones in our eyeballs, this node system would be the rods and cones interface that by correlating with action allows ai to link action potential with view of a digital environment.
The coolest thing about this is the ai would be in an app on another computer so it could navigate the internet in the same way humans do to find the info it needs to do tasks and could build seperate apps, rebuild new versions of itself on another computer using other ai like Pythagora and test its apps by actually opening them and trying them, it could set up a system where it remotely navigates multiple remote computers at a time and trains on all of them, ai building ai, automation automating automation and automating iterations out of better interactions with manual input becoming less and less in each one
The crux of this is giving ai an environment. People have tried with robots and virtual environments but why not the digital space itself, that's the first thing we need to do
So write an algorithm that enables this!
I feel like our ancient ancestors - pre-language - had a non-communicated version of language that language sits on top of but doesn't necessarily describe, even with huge language sets. We can hard code physics and things of that nature to almost literally ground AI in the real world, with being able to probe its answers for a quite material-based explanation for why it is stating something. One of the most captivating use cases for VR is just exploring and screwing around in photoreal 3d environments, "telling stories / pretending" BUT there is a huge lack of these at low price that are quality - I have to dig through about 80% detritus for the good stuff. If this suddenly wasn't the case, it would mean endless novel worlds to explore, personalized, responsive to speech, hand motions, etc. This drives mass adoption of VR, lowering the cost of high res, high FOV headsets that are comfortable, which in turn drives forward spatial AI
Regarding about the image generation being more like a continuum from previous milestones but was regarded by the public like an abrupt new thing, it made me think of the quote:
"It takes 10 years to become an overnight success."
Interesting to know Lua was used for AI at the time, as Justin mentioned it
I don't understand what World Labs is up to. The concept of modeling the world and reasoning about objects, places, and interactions in 3D space and time closely relates to topology, which is the mathematical study of how spatial relationships and properties persist or change under continuous transformations, such as stretching or bending without breaking. So, they are either giving a new name to the study of topology, or they don't fully grasp what math has already covered. Additionally, using mathematical reasoning and logic, combined with Multiphysics Simulations, produces more accurate results than relying on probabilistic token-based models for representing objects.
"Additionally, using mathematical reasoning and logic, combined with Multiphysics Simulations, produces more accurate results than relying on probabilistic token-based models for representing objects."
I'm not sure how useful this approach would be when it comes to interacting with the real world, which is so complex and unpredictable. It wouldn't be sufficient to control a robot or autonomous vehicle.
Certainly, a simulated environment is useful for training deep learning models but they'll still need training.
Topology sucks worst class in college
OpenAI still having a context length of 128K is just a bummer. If they won't increase that, I can see them losing the high-ground regarding the reasoning models. Google's Gemini has a Context length in the range of MILLIONS and this could give them the upper hand, since reasoning models essentially use up that finite context length pretty quickly.
Of course, summarizing the thought process and feeding back only the summary can reduce the amount of tokens needed, or simply making the AI think in a more efficient language that humans might not be able to understand (at the expense of transparency) could also help, but ultimately, they'll need to increase that context length. It will get ugly for them if they won't.
Yes. I wanted to use Gemini because of the context length. But something with this model is off. You can't use it for coding. It wants to teach you or something. But I want it to generate the whole code. 😂
What “archive” are they referring to in the interview?
It's kind of interesting that in this whole conversation they didn't really mention the most valuable outcomes from Spatial AI - Engineering, industrial design, and manufacturing, which are rooted in 3D parametric models. These are the topics that will yield the AGI abundance everyone is hoping for. It takes machines to make stuff, and the machines will be designed and built orders of magnitude faster when the design cycle happens in seconds.
Going from one 2d image to 3d is quite hard. But going from video to 3d is much easier because there is much more depth information in how closer objects move differently relative to objects further away.
I would be interested to know what the training data would be - 3D scans, dual camera or possibly control net data with depth maps and positioning info. The real challenge is as she states, a human can look at an infinite number of variations in the world and know what it is instantly (a chair for example, regardless of shape or build complexity we know is a "chair" even if we've seen one for the first time in our life, we just know it's a "chair" - for AI in 3D space this would be a challenge because it has no text cues, and manual human tagging for every object on earth is practically impossible). Tesla is limited to road data, it has no concept of objects invisible to it outside of the narrow depths of field from roads.
I hope they use all kinds of sensors when they build their data set, I like the idea of AI being able to estimate the temperature of things based on an image.
whoever can crack the cheap and fast apprehension of high quality 3d structure of the real world will provide a huge advantage to many fields, this has been a huge and expensive struggle for decades now, making design, engineering and games so much more expensive.. An "unlock" like this will provide training data for calculating 3d from the billions of 2d photos, videos. Then this links up to LLMs and makes them more precise as well. Once you have a detailed and precise understanding of a 3d environment there is TONS of "free" functionality - simulate new lighting, calculate volumes, path-finding for robots, physics, etc And these tasks can be done in non-real time as needed. This leads to AI-designed automation solutions acting on the real world, deploying drones, IoT etc which then completes the loop so we get a real-time sim of the real world with fidelity proportionate to locations of interest.
I would agree that training data based on real-world interaction will be required to make an AI that can operate without bias.
The extra speedy parts: "Abude-wee-but. Abio-meep. Meep-ekooiooueou-snarp." Dave in comparison: "Yehhh..... iiii...accctualllyy.."
very interesting topic, as I work in virtual construction, obviously it goes way beyond this but just reaffirms my opinion that ppl won't be designing and coordinating much for construction for much longer.
An early lunch of Optimus is in my opinion also to fullfill the need of visual data of homes and what happens there, because homes are not streets.
iPhone have had LiDAR sensors since the iPhone 12 Pro but haven’t been utilised much, photogrammetry is super interesting
To go with Spatial AI, I hope they also combine it with a physical structure that not only morphs under and around you but also visually represents the Ai world from our own ocular perspective. If that can be invented in my lifetime then I will be rather pleased. I have always dreamt of having my own Holo-Deck.
It is interesting the comment at around 29:00. Transformers, attention, LSTM, are linear. CNN's that have worked well for images, can be 2D for images, or 3D for point clouds. Those networks seem to represent well the 3D world. But there could be a better architecture to represent 3D data, objects, physical concepts etc.
This is the future in my opinion. I see this modaility taking over the 1D, 2D image modality in the near future. It'll give much more details in color and nuances to the LLM to process. It might be a little more computionally expensive but the benefits will outeeigh the risks and "costs" I think. The three modalities LLM has now will reach their thresholds at some point. It'll need to lean on more advanced technologies like this one too process more nuanced representations from the 3D or the real world.
Another one is video reading. Current LLMs can't do that. It reads transcripts only not the visuals.
With the camera shutter mode it's possible for this tech to read videos in 3D. Those will enrich the LLM in its representation algorithm. With a richer context, it'll reduce case of hallucinations even more.
Anyways, this is exciting stuffs if you'd ask me.
Somehow we're still seeing LLMs improving a lot (say, twice a year, e.g. with 4o, 1o) with overall the same amount of data, perhaps they're already incorporating more than 1D/2D in there..
Good look on this one Matt thank you!
This might be exactly what you need:
1. Learn about spatial AI and its potential impact on various industries.
2. Explore resources like the a16z Podcast with Fei-Fei Li and platforms like Mammoth AI to deepen your understanding.
3. Consider the ways spatial AI can enhance robotics, augmented reality, and world generation in the near future.
verses ai is working on similar spatial projects. They created the spatial web protocol that accepted by the IEEE.
Interestingly a paper came out just a few days ago, Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning.
loving this channel. such great insight.
Alexnet vs Imagenet what's the difference?
Great episode!
Meta's SceneScript is already capable of reconstructing environments and representing the layout of physical spaces. So, the question remains: what truly innovative or new approach is World Labs bringing to the table? And finally, there’s a need to reinvent Meta's Orion AR glasses and the advancements from Boston Dynamics.
I already did these experiments and development in the 90's using REAL PEOPLE, as a cash Math/Physics tutor (now over 80,000 hrs). Guaranteed clients an A in Engineering Physics 1, by reverse-engineering an expert system from the Curriculum, and employing "Lazy Learning" techniques, employing the HUMAN BRAIN as my neural network. I extended the concepts to the laboratory teaching, and my class became so popular, 3 hours per week, NO HOMEWORK, that the University changed a by law to allow withdrawing from the lecture class but keeping my class. Not only did "compute" not cost me anything, people paid ME to do my compute!. Also mastered transition to Algebra 1 fluency. No one wanted it, the schools just being totally corrupted with mediocrity. So now I tutor the smart, well to do, immigrant crowd and enjoy watching the world catch up.
Tutor the poor and least well off locals. . don't wait for them to "catch up", you'll be waiting forever.
I noted Justin Johnson spoke about 4D... to me, that speaks to AI dealing in a holistic view of the world, not only the basis of 3D models, but how those 3D models live in their placement within the world. But consider also, over an object's relative position in a given time in space. But I could be wrong... but that seems to be some far out Einstein Relativity thought here. What do you think?
🎯 Key points for quick navigation:
🌐 Spatial AI integrates AI with spatial data for enhanced contextual understanding.
🏙️ It enables AI systems to perceive and interact with physical spaces in real-time.
🚀 Spatial AI is considered the next frontier in AI architecture due to its applications in robotics and autonomous vehicles.
🌍 It revolutionizes industries by optimizing logistics, urban planning, and environmental monitoring.
📡 The technology enhances augmented reality experiences by overlaying digital information onto physical environments.
🛰️ Companies are investing heavily in Spatial AI for its potential to redefine navigation and mapping technologies.
🏗️ Challenges include privacy concerns and ethical implications regarding data usage in spatial mapping.
🌌 Future advancements in Spatial AI aim to create seamless interactions between digital and physical worlds.
Made with HARPA AI
Basically building the Matrix
Another insightful video Matt.
My first thought (as a Treky) Wow soon we could have holodecks' . On a more realistic note: resolution; perception; scale are factors that will become important with this technology. While as humans we would want the highest degree in all of these factors. Will related AI require the same levels of these same factors to percieve what we as humans do within a 3D space? I think not! In essence would spatial intellegence see our world better than we do? If so what are the potential benefits and implictions?
I love this channel!!
This stream of mind-blowing news just gets crazier every day... if it continues like this, you will have to switch to live broadcasting. Your insights and comments are very valuable.
Thank you!
Wearing a pony tail makes you look smarter. Just like having a british accent. I forgot, if you talk fast you are by definition a genious - so much to share with us mere mortals...
Very interesting discussion.
great commentary video thanks man
What if architecture of a model (weights interpreter) should be also dynamic and trainable via deep learning or via selection/mutation Darwin-style mechanic
I am surprised they did not mention Gaussian Splatting.
Phone manufacturers should implement 3d camera in their devices right away
That's a lot of words to say we're trying to build Star Trek's Holodeck.
fei fei, ilya sutskever, and alex started the ai deep learning revolution with alexnet back in 2012, they're so underrated ... to this day deep learning continues to conquer field after field, amazing
Matt, I keep telling you that the system of FSD and Optimus data and training related AIs has a jump on the rest of the world. I feel that today that collection of AIs has the best chance to reach AGI and ASI. A chance that just skyrocketed now that Tesla AI is no longer compute restrained. Too little attention is being paid to this segment of Tesla AI.
perfect layer for robot-3D interaction.
Wtf… you threw me off. Lmao
Thought my youtube was playing at double the speed.
Mammoooot 🤣
woop woop
@@Mammouth_ai is that really the sound of a Mammouth ?
Contact lenses with AR combined with neuronal output device on your ear with mic and bone induction sound ..(since brain input is still theoreticall - and risky when it will be possible
)
You would have AR and besically think what you want and it appears in AR....
ex:call somone ..time...video....see the collusem in real life and overlay as it was with gladiators and all, games like pokemon go but actually see the pokemon
(Now that i think about it we might have to invent a few new names for mental problems that people will have😅)
imagin once he humanoid robots do actions and the data can be analysed for VR games. all the interactions you could simulate and therefore have within videogames. that would be awesome.
All Mathews LOVE AI. Kidding, you know why I say that LOL. Love your channels, Matt bros lol 😄bigup guys 💯
The five basic human senses are Sight, Hearing, Smell, Taste and Touch. Ai will be doing that.
Research! Researching in 3D would be ground breaking. Everything that is important is 3D.
Genetics, physics, engineering. All 3D.
no. everything important is nD.
This is the right path, Spacial intelligence with point of interest.
Thanks for the video.
"Make me a new map for State of Decay 2." Suddenly Microsoft is going to be really concerned with intellectual property.
Is everyone gonna ignore the work of Jeff Hawkins like he hasn’t been doing this for 20 years a Numenta?
I bet this model would be very good at the marble problem 😃
Fascinating! 👍
What Justin delineated is actually sad. The only ones who are pushing the boundaries of AI can afford compute.
You can do all that video games do by learning to leave/phase out of the body. Jurgen Ziewe has a new video,Wild, based on his experiences. Neville Goddard has a video called, Out of this World, giving detailed instructions. Someone narrated a 6 part series on Frank Kepples techniques.
I can verify some of what they experienced. AI is just another attempt to go back to where we were before birth. It is like the call of the salmon who knows where its home is.
Perhaps AI could be used to find the most efficient way to phase out to the infinite worlds beyond so everyone can do this.
Very interesting, thank you.
Bro drank 5 red bulls before the interview😵💫
Turning on captions makes a mess when the video in the video also has them
Yes. If u feed the LLM with them in the video. Shut caption off, make sure the video is clean or represented properly before input. That will reduce case of hallucination.
Well... very informative and cool and I can see it being applied in many industries, however it feels like there is something missing. Oh yes where on earth are you going to store all that spacial data ? The next big thing would be then Storage! Not compute...
Simulation like Matrix to imprison people, or like the Seed from Sword Art Online to create new game worlds.
Another wonderful and informative video. Thank you. They sound like auctioneers.
Like I told openAI, it's a megawatt per million for your projects (mouha ha haha (the energy company rubbing its hands))
So very fascinating 🎉
Damn ive already saw that interview.. i thought it was something new... Ok ..👍 👍
I cant help but chuckle everytime people say about new paradigm of reasoning and chain of thought…
I put together a group chat for gpt3 to discuss my prompt and witnessed the power of it a few days after chatgpt release…
Am i genius or are these ‘researchers’ dumb?😅
I dont get it guys, are they slowly milking the cow that should have been a phrnomenal steak right away?
People who think AI has plateaued have no idea what it's going to do when it can generate its own data by interacting with the world.
imagine playing game with your friend in the 3d word create by AI and you can do all the things you want.....🤯
Tesla is already doing spacial AI
@@ikjb8561 agreed.
The retina gets a 2D image and the brain computes it to 3D. So standard procedure I guess.
On October 10th the world will see that Tesla inc. has already built a real-world AGi. You’ll see
I'm now listening 3x speed of Feifeili
except Tesla's data isn't spatial since Elon insisted cameras were enough, no?
If we're crafting AIs in our own image, we shouldn't expect them to think outside the box we've built. Matt you should stop erpeating "real world" and wipe off that overzealous grin - this is a funeral my man. What we're teaching AI to perceive is the world we live in. Even the visual representation of local reality at the visible spectrum will be rendered differently by beings with a different central nervous system. It is okay to limit AI to human cognitive/perceptive frameworks, but don't then complain that it can't provide novel insights. This is another nail into the coffin of ever having a truly foreign intelligence, another brushstroke on the frame of the mirror we like to admire ourselves in.
if machine goes on higher sensory level (how many dimensions they theoretically counted already? like 11 above us?) - we will stand where we are stuck forever, it makes the feeling like birds leaving the nest to the winter migration. I watched this summer storks nest full growing, yes only one chick survived from 4, but it was personal sadness when it flies away and maybe never returns, if even survives in Africa at all.
Humans need to transform into digital entity, because every mammal are made from a cell, from foreign microbial world by nature's help we transformed into bigger creatures/sizes world which kinda goes nowhere, it will be natural to leave this world not by collapsing Dna in every cell, moving next level.
"around 2012 ish"
Yup. But the idea didnt pop into his head.
They said
.. plumber cannot be replaced... with spatial intelligence you can be plumber! with that device it can project AR Of pipes behind the wall because spatial intelligence able to generate 3d!, mechanic?
36:00 EA is already on this.