XNect: Real-time Multi-person 3D Motion Capture With a Single RGB Camera (SIGGRAPH 2020)

แชร์
ฝัง
  • เผยแพร่เมื่อ 15 ม.ค. 2025

ความคิดเห็น • 72

  • @thomasgoodwin2648
    @thomasgoodwin2648 4 ปีที่แล้ว +9

    Excuse me a moment while I pick my jaw up from the floor. Truly awesome! Not long before automation of all objects in a scene (not just human). State of the art yesterday has been blown away. Can't wait to see what state of the art is this afternoon. Thank you and keep up the amazing work.

  • @AZTECMAN
    @AZTECMAN 4 ปีที่แล้ว +4

    Awesome work. Congratulations on making it into the SIGGRAPH conference!

  • @NickGeo25
    @NickGeo25 4 ปีที่แล้ว +40

    I'm curious how much the quality can improve just by adding one more camera placed on the side.

    • @dripsin50
      @dripsin50 4 ปีที่แล้ว +4

      I had that exact thought. Why not treat the cameras similar to base stations of the oculus/vive/index. With two sources to sample from would the tracking be a lot smoother and have a decent margin of error correction?

    • @i-conicvision1058
      @i-conicvision1058 4 ปีที่แล้ว +5

      @@dripsin50 If you're interested, we are developing software that allows for real-time epipolar resampling of multiple video streams from moving cameras (we use drone video). This means that you could use your technology on two videos and very accurately calculate positions of the poses.

    • @scottturcott1710
      @scottturcott1710 4 ปีที่แล้ว

      Like a hypothetical snap chat concerts filter that while everyone are already filming a concert it would build on point data for a program like this.

    • @i-conicvision1058
      @i-conicvision1058 4 ปีที่แล้ว

      @@scottturcott1710 Potentially, Yes.

    • @ilmarselter
      @ilmarselter 4 ปีที่แล้ว

      I wonder how much the quality can improve just by using higher fps camera. A commercial product is still using low resolution PS Eye cameras for motion capture.

  • @dissonantprotean5495
    @dissonantprotean5495 3 ปีที่แล้ว +1

    Super exciting, this makes motion capture way more accessible

  • @novaria
    @novaria 4 ปีที่แล้ว +17

    I found the original paper but where can I find a demonstration repository? Is it open-source and MIT? If not, what are your plans on this?
    I plan on building a free and open-source framework for hobby game developers and animators (strictly non-commercial).

    • @mesmes9424
      @mesmes9424 11 หลายเดือนก่อน

      Same, is it available?

  • @blendlogic4151
    @blendlogic4151 4 ปีที่แล้ว +12

    Please when will this be release. Cant wait to get my hands on it

    • @pacoreguenga
      @pacoreguenga 3 ปีที่แล้ว

      It’s already available as a C++ library.

    • @azarkiel
      @azarkiel 3 ปีที่แล้ว

      @@pacoreguenga do you know where is this library? I would like to work with it. thanks in advance.

  • @kickassmovies5952
    @kickassmovies5952 4 ปีที่แล้ว +7

    When will it be out in public domain, it looks similar to app made by Radical long time back.

  • @smirnovslava
    @smirnovslava 4 ปีที่แล้ว +4

    Good work and congrats on siggraph! Any plans for training code being published?

  • @VivaZapataProductionsLLC
    @VivaZapataProductionsLLC 4 ปีที่แล้ว +3

    This is great stuff. How do we get access to it? I couldn't find any more information how to actually obtain this program or software...

  • @Jewelsonn
    @Jewelsonn 4 ปีที่แล้ว +21

    I want this released for MikuMikuDance software

  • @21graphics
    @21graphics 2 ปีที่แล้ว +1

    what is RGB camera?

  • @virtual_intel
    @virtual_intel 3 ปีที่แล้ว

    How does this benefit us viewers? and when can we gain access to the tool?

  • @Tactic3d
    @Tactic3d 4 ปีที่แล้ว

    Great work! Very impressive result.

  • @goteer10
    @goteer10 4 ปีที่แล้ว +9

    Does it work well with face occlusion? This would be great for a cheap vr fullbody tracking alternative if it did. Great for when institutions don't quite have the budget for everything

    • @mehtadushy
      @mehtadushy 4 ปีที่แล้ว +4

      As discussed in the supplemental document, the system as it is does not work with face occlusions, because in the absence of facial cues it cannot tell whether it is looking at the front of the body or the back. However, as we demonstrated in VNect, one can get around it by sticking images of human eyes on to the VR head set.

  • @Augmented_AI
    @Augmented_AI 4 ปีที่แล้ว

    Does it work in Unity?

  • @MadsterV
    @MadsterV 3 ปีที่แล้ว

    This looks very stable! and no depth info? that's amazing.

  • @mxmilkiib
    @mxmilkiib 4 ปีที่แล้ว

    Videos of Contact Improvisation dance jams would make for good tests

  • @joanamat5139
    @joanamat5139 4 ปีที่แล้ว +2

    I've seen a lot of experimental demostration videos like this, but never comes out a real product.

  • @titter3648
    @titter3648 4 ปีที่แล้ว +1

    There is some glitches where one part of the skeleton instantly goes from one position to another for just a second or less and then goes back to the correct position. Maybe you can add a filter to filter out "impossible" fast accelerations and speeds.

  • @camswainson-whaanga2750
    @camswainson-whaanga2750 4 ปีที่แล้ว

    Are you working on hand and finger tracking close up? if we only have our hands in the camera view.

  • @amirierfan
    @amirierfan 3 ปีที่แล้ว

    Insane!!!!!

  • @donk.johnson7346
    @donk.johnson7346 4 ปีที่แล้ว

    why all the foot sliding?

  • @luisfable
    @luisfable 3 ปีที่แล้ว

    where is this

  • @dietrichdietrich7763
    @dietrichdietrich7763 ปีที่แล้ว

    Awesome

  • @jackcottonbrown
    @jackcottonbrown 3 ปีที่แล้ว

    Can this run on an iPhone?

  • @angeloman87
    @angeloman87 4 ปีที่แล้ว

    Can i use this animations on max?

  • @DP-ee6qv
    @DP-ee6qv 3 ปีที่แล้ว

    Software name?

  • @andrewgonzalez620
    @andrewgonzalez620 4 ปีที่แล้ว +1

    Can I download this

  • @shadatorr9378
    @shadatorr9378 4 ปีที่แล้ว

    the spine almost dosent bend at all but over all its really great and usefull

  • @birdisland
    @birdisland 4 ปีที่แล้ว

    how can I get this software? Is there any website for purchasing?

  • @devWeidz
    @devWeidz 4 ปีที่แล้ว +3

    This is huge, the most affordable mocap suit today still cost like 3K€, for only one actor, without 6DOF position and from what I see, about the same accuracy if not worse. Does the estimations give an indice of accuracy per points of the poses ? If so it shouldn't be too hard to cross analyse data from multiple angles and get a more smooth and accurate output right ?

    • @i-conicvision1058
      @i-conicvision1058 4 ปีที่แล้ว +1

      If you're interested, we are developing software that allows for real-time epipolar resampling of multiple video streams from moving cameras (we use drone video). This means that you could use your technology on two videos and very accurately calculate positions of the poses. (www.i-conic.eu)

  • @Drago.23
    @Drago.23 4 ปีที่แล้ว +1

    how was the motion transferred to 3d models?

    • @Ethan-ny4vg
      @Ethan-ny4vg 3 ปีที่แล้ว

      Have you sloved this?? l also wanna know how

  • @Ethan-ny4vg
    @Ethan-ny4vg 3 ปีที่แล้ว

    is the character controlor in Unity???anybody knows??thanks

  • @daehyeonkong1762
    @daehyeonkong1762 4 ปีที่แล้ว

    Awsome!

  • @acidcube6967
    @acidcube6967 4 ปีที่แล้ว +2

    🌟✨🤛
    Well done! Inspirational start to something that I would love to incorporate into a game scenario I am working on. Could this potentially work in real-time in combination with Apples IOS Lidar Apps such as scanner
    Is it possible to contact you apart from here?
    Cheers Marlon

  • @bolzanoitaly8360
    @bolzanoitaly8360 2 ปีที่แล้ว

    what you want to show us,
    if you can't share the Model, then what is the need of this,
    even I can take this video and can place on my VLOG.
    this is just nothing.....
    can you share the model and code, please?

  • @ziadeldeeb6066
    @ziadeldeeb6066 4 ปีที่แล้ว

    what is the RGB camera do you use, type ?

    • @thejetshowlive
      @thejetshowlive 4 ปีที่แล้ว +1

      from the video it was looking like a logitech...if that IS what they are using.

  • @viniciusplaygames6042
    @viniciusplaygames6042 4 ปีที่แล้ว +2

    someone can tell me how I download Vnect or Xnect? thanks :)

    • @azarkiel
      @azarkiel 3 ปีที่แล้ว

      You and me are in the same point. I would like to play with both :)

    • @Linaaryani-f2q
      @Linaaryani-f2q 3 ปีที่แล้ว

      What a software bro

  • @Cera_ve858
    @Cera_ve858 4 ปีที่แล้ว

    wow already throwing out the ping pong balls

  • @phillipfury528
    @phillipfury528 4 ปีที่แล้ว +2

    Hi! I'm a professional mocap performer who recently worked with Marvel and Fox on different projects.. I am curious about this software. I would love to connect with everyone!

  • @cmdkaboom
    @cmdkaboom 4 ปีที่แล้ว

    its funny they focus the video on the people tracking and make the actual rig motion small when its shown. A lot of jitter when you actually see it on a rig. Maybe they will improve it... doesn't seem to be there yet.....

    • @hughjassstudios9688
      @hughjassstudios9688 4 ปีที่แล้ว

      Nothing post processing can't fix. Perhaps blending the jitter in post

  • @dasrio8307
    @dasrio8307 4 ปีที่แล้ว

    Do you compare with VNect in term of MPJPE error?

    • @mehtadushy
      @mehtadushy 4 ปีที่แล้ว

      Please refer to the paper for detailed comparisons.

  • @cybermad64
    @cybermad64 4 ปีที่แล้ว +1

    I understand what you are trying to achieve here but you still are far from optimal result. We saw the 'Everybody Dance Now' tech demo 2 years ago, I know it wasn't realtime but the pose detection was super clean. In your solution, the poses are not acurate, the skeleton are shaking, legs and arms position are off most of the time, knees are not facing the right direction... Your simple skeleton looks 'okay-ish' but once applied on a 3d model it's un-usable.

    • @mehtadushy
      @mehtadushy 4 ปีที่แล้ว +21

      I think there are some mis-notions here that need to be clarified. 'Everybody dance now' makes use of a 2D pose backend (OpenPose), not 3D pose. They don't have any bone length consistency constraints, nor 3D plausibility to worry about. Additionally, not being real-time is kind of the key to the better visual accuracy of the 2D backend used by their project. Openpose is applied at multiple image scales and the results combined together, whereas our approach does multi-person 3D at roughly 2-3x the speed of single-scale Openpose, and at least an order of magnitude faster than the multi-scale variant. As far as 'unusability' of motion applied to 3D models, it comes down to the end application. This is where the multi-stage design proposed in the paper comes into play. It allows you to insert domain expertise into different stages to improve the aspects that are important to your end application. You can swap out Stage I for a heavier/ multi-scale pipeline if you care more about accuracy than real-time performance. You can swap out Stage II for alternate designs which incorporate more data or inter-penetration constraints or bio-mechanical constraints or temporal constraints and what not. Similarly you can swap out Stage III to better exhibit the characteristics needed by your end application, or have stronger/ better-tuned temporal filtering applied in Stage III . There are other ways to achieve temporal stability too, such as by breaking causality, which targets a whole different set of applications.
      I am sure you understand that this video and paper is not an advertisement for a solution that we are selling for money, rather the system shown in the paper is a vehicle for us to demonstrate/convey several key points regarding a new efficient convolutional network architecture design, and a way of thinking about multi-person pose estimation in a multi-staged way, that is different from contemporary approaches. We, in fact, perform comparably to non-real-time contemporary monocular 3D pose approaches on various challenging benchmarks, while running in real-time. This is a research prototype, of course it has limitations, which are even discussed in the supplemental document. Other multi-person 3D pose work also has similar limitations, while not even running in real-time. What we show here was the state of the art multi-person monocular 3D pose estimation system at the time of submission of the paper. Any lessons from recent/future work to mitigate some of these issues can equally be applied to our approach.
      We are thankful to you for engaging with us, and we welcome suggestions for improvement. I just wanted to set the context and expectations vis-à-vis prior work.

    • @NiloRiver
      @NiloRiver 4 ปีที่แล้ว +1

      You are a very demanding guy. How about sharing your solution with us?

    • @acidcube6967
      @acidcube6967 4 ปีที่แล้ว

      Dushyant Mehta
      Looks Inspirational start to something greatly useable
      〽️🕶

  • @Kiran.KillStreak
    @Kiran.KillStreak 4 ปีที่แล้ว

    Seeing like this videos ,since Kinect v1, nothing is useful for game developers without retouching .

    • @kendarr
      @kendarr 4 ปีที่แล้ว

      There is really no such thing as a perfect mocap, a cleanup is always needed, this is awesome considering it dosen't contain any depth data

    • @checkanr138
      @checkanr138 3 หลายเดือนก่อน

      lazy does not do the work for you. every great game needs top class animators that finetune the animations even if you use top class motion capturing equipment.

  • @nholmes86
    @nholmes86 4 ปีที่แล้ว

    haha only crap...turns out rgb is best in caculation

  • @williamweidner5425
    @williamweidner5425 3 ปีที่แล้ว

    Is there a way to capture finger motions with this?