Kuan Fang
Kuan Fang
  • 12
  • 5 041
KALIE: Fine-Tuning Vision-Language Models for Open-World Manipulation without Robot Data
Building generalist robotic systems involves effectively endowing robots the capabilities to handle novel objects in an open-world setting. Inspired by the advances of large pre-trained models, we propose Keypoint Affordance Learning from Imagined Environments (KALIE), which adapts pre-trained Vision Language Models (VLMs) for robotic control in a scalable manner. Instead of directly producing motor commands, KALIE controls the robot by predicting point-based affordance representations based on natural language instructions and visual observations of the scene. The VLM is trained on 2D images with affordances labeled by humans, bypassing the need for training data collected on robotic systems. Through an affordance-aware data synthesis pipeline, KALIE automatically creates massive high-quality training data based on limited example data manually collected by humans. We demonstrate that KALIE can learn to robustly solve new manipulation tasks with unseen objects given only 50 example data points. Compared to baselines using pre-trained VLMs, our approach consistently achieves superior performance.
Website: kalie-vlm.github.io/
มุมมอง: 738

วีดีโอ

[RSS 2024] MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting
มุมมอง 1612 หลายเดือนก่อน
Abstract: Open-world generalization requires robotic systems to have a profound understanding of the physical world and the user command to solve diverse and complex tasks. While the recent advancement in vision-language models (VLMs) has offered unprecedented opportunities to solve open-world problems, how to leverage their capabilities to control robots remains a grand challenge. In this pape...
Generalization with Lossy Affordances: Leveraging Broad Offline Data for Learning Visuomotor Tasks
มุมมอง 1892 ปีที่แล้ว
Abstract: The utilization of broad datasets has proven to be crucial for generalization for a wide range of fields. However, how to effectively make use of diverse multi-task data for novel downstream tasks still remains a grand challenge in robotics. To tackle this challenge, we introduce a framework that acquires goal-conditioned policies for unseen temporally extended tasks via offline reinf...
Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in Latent Space
มุมมอง 1012 ปีที่แล้ว
General-purpose robots in real-world settings require diverse repertoires of behaviors to complete challenging tasks in unstructured environments. To address this problem, goal-conditioned reinforcement learning aims to train policies that can reach configurable goals for a wide range of tasks on command. However, such goal-conditioned policies are notoriously difficult and time-consuming to tr...
Fei-Fei Li - Creating Diverse Tasks to Catalyze Robot Learning
มุมมอง 1493 ปีที่แล้ว
Title: Creating Diverse Tasks to Catalyze Robot Learning Abstract: Data has become an essential catalyst for the development of artificial intelligence. But it is challenging to obtain data for robotic learning. So how should we tackle this issue? In this talk, we start with retrospection of how ImageNet and other large-scale datasets incentivized the deep learning revolution in the past decade...
Chris Choy - 3D Perception with Sparse Tensor
มุมมอง 2874 ปีที่แล้ว
Title: 3D Perception with Sparse Tensor Invited Speaker: Chris Choy (NVIDIA Research) Bio: Chris Choy is a research scientist at the Nvidia Research AI/ML team. Before joining Nvidia, he received a Ph.D. from Stanford University. He worked on navigation, 2D feature learning, 2D scene graph, 3D perception, 3D reconstruction, building 3D datasets, and 4D perception. Before joining the lab, he als...
Jeannette Bohg - On Perceptual Representations and How They Interact with Decision-Making
มุมมอง 964 ปีที่แล้ว
Title: On Perceptual Representations and How They Interact with Decision-Making Invited Speaker: Jeannette Bohg (Stanford University) Bio: Jeannette Bohg is an Assistant Professor of Computer Science at Stanford University. She was a group leader at the Autonomous Motion Department (AMD) of the MPI for Intelligent Systems until September 2017. Before joining AMD in January 2012, Jeannette Bohg ...
Yunfei Bai - How to Solve Sim2Real for Robot Grasping with GAN
มุมมอง 4564 ปีที่แล้ว
Title: How to Solve Sim2Real for Robot Grasping with GAN Invited Speaker: Yunfei Bai (X Inc) Bio: Yunfei Bai leads robot simulation team in X (formerly Google [x]). His research at X focuses on physics-based simulation and deep learning for robotics. In particular, he is interested in learning robot manipulation skills using deep reinforcement learning and learning from demonstration, by levera...
Adaptive Procedural Task Generation for Hard-Exploration Problems
มุมมอง 3024 ปีที่แล้ว
We introduce Adaptive Procedural Task Generation (APT-Gen), an approach for progressively generating a sequence of tasks as curricula to facilitate reinforcement learning in hard-exploration problems. At the heart of our approach, a task generator learns to create tasks via a black-box procedural generation module by adaptively sampling from the parameterized task space. To enable curriculum le...
Learning Task-Oriented Grasping for Tool Manipulation from Simulated Self-Supervision
มุมมอง 1.3K6 ปีที่แล้ว
Tool manipulation is vital for facilitating robots to complete challenging task goals. It requires reasoning about the desired effect of the task and thus properly grasping and manipulating the tool to achieve the task. Task-agnostic grasping optimizes for grasp robustness while ignoring crucial task- specific constraints. In this paper, we propose the Task-Oriented Grasping Network (TOG-Net) t...
Recurrent Autoregressive Networks for Online Multi-Object Tracking
มุมมอง 6466 ปีที่แล้ว
The main challenge of online multi-object tracking is to reliably associate object trajectories with detections in each video frame based on their tracking history. In this work, we propose the Recurrent Autoregressive Network (RAN), a temporal generative modeling framework to characterize the appearance and motion dynamics of multiple objects over time. The RAN couples an external memory and a...
Multi-Task Domain Adaptation for Deep Learning of Instance Grasping from Simulation
มุมมอง 5946 ปีที่แล้ว
Learning-based approaches to robotic manipulation are limited by the scalability of data collection and accessibility of labels. In this work, we present a multi-task domain adaptation framework for instance grasping in cluttered scenes by utilizing simulated robot experiments. Our neural network takes monocular RGB images and the instance segmentation mask of a specified target object as input...

ความคิดเห็น

  • @yutao3419
    @yutao3419 4 ปีที่แล้ว

    great work!