Mono-Camera-Only Target Chasing for a Drone in a Dense Environment by Cross-Modal Learning

แชร์
ฝัง
  • เผยแพร่เมื่อ 26 พ.ค. 2024
  • * Status: Accepted for publication in IEEE Robotics and Automation Letters (RA-L)
    * Category: Vision-Based Navigation, Visual Learning, Deep Learning for Visual Perception
    * Author: Seungyeon Yoo¹, Seungwoo Jung¹, Yunwoo Lee, Dongseok Shim, and H. Jin Kim
    * Abstract: Chasing a dynamic target in a dense environment is one of the challenging applications of autonomous drones. The task requires multi-modal data, such as RGB and depth, to accomplish safe and robust maneuver. However, using different types of modalities can be difficult due to the limited capacity of drones in aspects of hardware complexity and sensor cost. Our framework resolves such restrictions in the target chasing task by using only a monocular camera instead of multiple sensor inputs. From an RGB input, the perception module can extract a cross-modal representation containing information from multiple data modalities. To learn cross-modal representations at training time, we employ variational autoencoder (VAE) structures and the joint objective function across heterogeneous data. Subsequently, using latent vectors acquired from the pre-trained perception module, the planning module generates a proper next-time-step waypoint by imitation learning of the expert, which performs a numerical optimization using the privileged RGB-D data. Furthermore, the planning module considers temporal information of the target to improve tracking performance through consecutive cross-modal representations. Ultimately, we demonstrate the effectiveness of our framework through the reconstruction results of the perception module, the target chasing performance of the planning module, and the zero-shot sim-to-real deployment of a drone.
    * Contact: syeon.yoo@snu.ac.kr; tmddn833@snu.ac.kr
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 2