Computer Vision Meetup: Intrinsic Self-Supervision for Data Quality Audits
ฝัง
- เผยแพร่เมื่อ 9 ก.พ. 2025
- Benchmark datasets in computer vision often contain issues such as off-topic samples, near-duplicates, and label errors, compromising model evaluation accuracy. This talk will discuss SelfClean, a data-cleaning framework that leverages self-supervised representation learning and distance-based indicators to detect these issues effectively.
By framing the task as a ranking or scoring problem, SelfClean minimizes human effort while outperforming competing methods in identifying synthetic and natural contamination across natural and medical domains. With this methodology, we identified up to 16% of problematic samples in current benchmark datasets and enhanced the reliability of model performance evaluation.
Read the paper: arxiv.org/abs/...
About the Speaker:
Fabian Gröger is a second-year PhD Student supervised by Alexander A. Navarini and Marc Pouly at the University of Basel. His research interests include self-supervised learning, data-centric machine learning research, and medical imaging.