- 5
- 1 614
Shubodh
เข้าร่วมเมื่อ 30 ต.ค. 2011
ECCV 2024 - Revisit Anything: Visual Place Recognition via Image Segment Retrieval
Check out more at revisit-anything.github.io/!
Accurately recognizing a revisited place is crucial for embodied agents to localize and navigate. This requires visual representations to be distinct, despite strong variations in camera viewpoint and scene appearance. Existing visual place recognition pipelines encode the whole image and search for matches. This poses a fundamental challenge in matching two images of the same place captured from different camera viewpoints: the similarity of what overlaps can be dominated by the dissimilarity of what does not overlap. We address this by encoding and searching for image segments instead of the whole images. We propose to use open-set image segmentation to decompose an image into ‘meaningful’ entities (i.e., things and stuff). This enables us to create a novel image representation as a collection of multiple overlapping subgraphs connecting a segment with its neighboring segments, dubbed SuperSegment. Furthermore, to efficiently encode these SuperSegments into compact vector representations, we propose a novel factorized representation of feature aggregation. We show that retrieving these partial representations leads to significantly higher recognition recall than the typical whole image based retrieval. Our segments-based approach, dubbed SegVLAD, sets a new state-of-the-art in place recognition on a diverse selection of benchmark datasets, while being applicable to both generic and task specialized image encoders. Finally, we demonstrate the potential of our method to “revisit anything” by evaluating our method on an object instance retrieval task, which bridges the two disparate areas of research: visual place recognition and object-goal navigation, through their common aim of recognizing goal objects specific to a place.
Accurately recognizing a revisited place is crucial for embodied agents to localize and navigate. This requires visual representations to be distinct, despite strong variations in camera viewpoint and scene appearance. Existing visual place recognition pipelines encode the whole image and search for matches. This poses a fundamental challenge in matching two images of the same place captured from different camera viewpoints: the similarity of what overlaps can be dominated by the dissimilarity of what does not overlap. We address this by encoding and searching for image segments instead of the whole images. We propose to use open-set image segmentation to decompose an image into ‘meaningful’ entities (i.e., things and stuff). This enables us to create a novel image representation as a collection of multiple overlapping subgraphs connecting a segment with its neighboring segments, dubbed SuperSegment. Furthermore, to efficiently encode these SuperSegments into compact vector representations, we propose a novel factorized representation of feature aggregation. We show that retrieving these partial representations leads to significantly higher recognition recall than the typical whole image based retrieval. Our segments-based approach, dubbed SegVLAD, sets a new state-of-the-art in place recognition on a diverse selection of benchmark datasets, while being applicable to both generic and task specialized image encoders. Finally, we demonstrate the potential of our method to “revisit anything” by evaluating our method on an object instance retrieval task, which bridges the two disparate areas of research: visual place recognition and object-goal navigation, through their common aim of recognizing goal objects specific to a place.
มุมมอง: 216
วีดีโอ
LIP-Loc: LiDAR Image Pretraining for Cross-Modal Localization
มุมมอง 1008 หลายเดือนก่อน
LIP-Loc: LiDAR Image Pretraining for Cross-Modal Localization
3 Lectures on SLAM: E03 - The Big Picture and SOTA Methods
มุมมอง 2274 ปีที่แล้ว
C07 S03 SLAM of IIITH RRC Summer School Series. This is part 3 of 3 lectures on SLAM. Find remaining two here - th-cam.com/play/PLi59bVTQX5Rpr4umjFfbXeWjfqBwMgLql.html slides-cum-notes: www.notion.so/saishubodh/SLAM-The-Big-Picture-from-Linear-Algebra-to-Optimization-Frontend-to-Backend-f65a96226a87459b9a573784cffb92e5 (Email p.saishubodh@gmail.com for access to notes) github.com/RoboticsIIITH/...
3 Lectures on SLAM: E02 - SLAM Frontend & Backend - From feature matching to pose graph optimization
มุมมอง 5484 ปีที่แล้ว
C07 S03 SLAM of IIITH RRC Summer School Series. github.com/RoboticsIIITH/summer-sessions-2020 (see end of description for notes access) This is part 2 of 3 lectures on SLAM. Find remaining two here - th-cam.com/play/PLi59bVTQX5Rpr4umjFfbXeWjfqBwMgLql.html Ignore the video title you see at the start of the video. Topics covered in this video: 1. SLAM Frontend: Visual Odometry (2D-2D, 3D-2D, 3D-3...
3 Lectures on SLAM: E01 - Least Squares Optimization, ICP, Loop Closure
มุมมอง 5254 ปีที่แล้ว
Slides-cum-notes: www.notion.so/saishubodh/SLAM-The-Big-Picture-from-Linear-Algebra-to-Optimization-Frontend-to-Backend-f65a96226a87459b9a573784cffb92e5 (Email p.saishubodh@gmail.com for access to notes) This is part 1 of 3 lectures on SLAM. Find remaining two here - th-cam.com/play/PLi59bVTQX5Rpr4umjFfbXeWjfqBwMgLql.html Newton's method clearly explained again at around 1:19:00. github.com/Rob...
Hi Subodth. Could you provide access to the notes?
Hi Veera, please email me at p.saishubodh@gmail.com for the access.
Hello. Thanks for the video. Could you enable notion notebooks for public viewing. Thanks
Hey, please shoot an email to p.saishubodh@gmail.com for access.
Three things to be noted here: 1. Read video description. 2. Newton's method clearly explained again at around 1:19:00 if it was unclear before. 3. The intuition i gave to explain the condition that any descent method should satisfy is partially incorrect. The Δx (\delta{x}) or the step that you take when you want to update your current estimate is always along the x-axis (or domain axis). So the arrow marks that I was drawing should rather be along x-direction. I should've used level curves instead to give the intuition. Refer to Stephen Boyd's Convex Optimization for more clarity.