#227

แชร์
ฝัง
  • เผยแพร่เมื่อ 4 ก.พ. 2025
  • Current state-of-the-art document retrieval solutions mainly follow an index-retrieve paradigm, where the index is hard to be directly optimized for the final retrieval target. This paper aims to show that an end-to-end deep neural network unifying training and indexing stages can significantly improve the recall performance of traditional methods. Neural Corpus Indexer (NCI) is a sequence-to-sequence network that generates relevant document identifiers directly for a designated query. To optimize the recall performance, NCI has a prefix-aware weight-adaptive decoder architecture, and leverages tailored techniques including query generation, semantic document identifiers, and consistency-based regularization. Empirical studies demonstrated the superiority of NCI on two commonly used academic benchmarks, achieving +21.4% and +16.8% relative enhancement for Recall@1 on NQ320k dataset and R-Precision on TriviaQA dataset, respectively, compared to the best baseline method.
    In this video, I talk about the following: What is the NCI model’s architecture? How does NCI perform?
    For more details, please look at arxiv.org/pdf/...
    Wang, Yujing, Yingyan Hou, Haonan Wang, Ziming Miao, Shibin Wu, Qi Chen, Yuqing Xia et al. "A neural corpus indexer for document retrieval." Advances in Neural Information Processing Systems 35 (2022): 25600-25614.

ความคิดเห็น •