The incredibly smooth voice almost feels like he's going to sell me this "one weird trick" for retargeting my motion vectors, call 800-NEURAL, first 100 to sign get a free pack of neurons 😂 Jokes aside, amazing work guys. Hope at some point this becomes some "retarget motion" checkbox in game engines and whatever dedicated motion capture software that was.
This is truly wizard work. The idea of disentangling the models and reducing them to a common latent space is 🤯. Is there ANYTHING that a neural network won't optimize???
At 1:45 "Our skeletal convolution is done by a collection of temporal kernels whose support varies across the skeleton structure, and our skeletal pooling is performed by merging adjacent armatures." I can see the target audience for this video is super hyper knowledgeable experts in this field.
Yes, the paper was a tad confusing too... temporal kernels are the same as any old convolutional kernel but with one of the dimensions being time. Imagine a 2D pixel graph(black and white picture). If you wanted a black and white video you could stack these graphs on top of each other. Then you have a cube with one of the sides being time. A 3D kernel which convolves over this larger cube would then be a temporal kernel because it is convolving over time and the different images of the video sequence. "The support varies over skeleton structure" I think is because the skeletal graphs have to be homeomorphic which is a general math term but in the context of geometry it means two objects being able to be morphed into each other without breaking the object. Example, a Donut can be morphed into a coffee mug, but a sphere cannot without tearing the mesh. So a donut and a coffee mug are homeomorphic, a sphere is NOT homeomorphic to a donut. In this context, if you train the network to work on people, it will only work on people, 5 limbs coming from a center (neck, Larm Rarm, Rleg, Lleg). If you trained it on a worm, it will only work for worms. Finally, "merging adjacent armatures" was something I found terribly confusing. Mainly because "Armature" in the 3D modeling world typically means an entire skeleton. They are using it for just a bone segment. So they take two adjacent bones and merge them together in the skeleton. This operation preserves the skeleton form but also brings the skeleton into a more general space that can be decoded into other types of skeletons.
The incredibly smooth voice almost feels like he's going to sell me this "one weird trick" for retargeting my motion vectors, call 800-NEURAL, first 100 to sign get a free pack of neurons 😂
Jokes aside, amazing work guys. Hope at some point this becomes some "retarget motion" checkbox in game engines and whatever dedicated motion capture software that was.
This is truly wizard work. The idea of disentangling the models and reducing them to a common latent space is 🤯. Is there ANYTHING that a neural network won't optimize???
They can even optimize each-other. Google meta-learning !
People.
@@neo2652 not even; they can figure out better methods for us to perform tasks
At 1:45 "Our skeletal convolution is done by a collection of temporal kernels whose support varies across the skeleton structure, and our skeletal pooling is performed by merging adjacent armatures."
I can see the target audience for this video is super hyper knowledgeable experts in this field.
Yes, the paper was a tad confusing too... temporal kernels are the same as any old convolutional kernel but with one of the dimensions being time. Imagine a 2D pixel graph(black and white picture). If you wanted a black and white video you could stack these graphs on top of each other. Then you have a cube with one of the sides being time. A 3D kernel which convolves over this larger cube would then be a temporal kernel because it is convolving over time and the different images of the video sequence.
"The support varies over skeleton structure" I think is because the skeletal graphs have to be homeomorphic which is a general math term but in the context of geometry it means two objects being able to be morphed into each other without breaking the object. Example, a Donut can be morphed into a coffee mug, but a sphere cannot without tearing the mesh. So a donut and a coffee mug are homeomorphic, a sphere is NOT homeomorphic to a donut. In this context, if you train the network to work on people, it will only work on people, 5 limbs coming from a center (neck, Larm Rarm, Rleg, Lleg). If you trained it on a worm, it will only work for worms.
Finally, "merging adjacent armatures" was something I found terribly confusing. Mainly because "Armature" in the 3D modeling world typically means an entire skeleton. They are using it for just a bone segment. So they take two adjacent bones and merge them together in the skeleton. This operation preserves the skeleton form but also brings the skeleton into a more general space that can be decoded into other types of skeletons.
Superb work guys!
amazing!
did they hire someone to do the narration