#237 MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Building an AI Chatbot with AI SDK, Hono, TypeScript & More

PiXXiE - Pick A Card | OFFICIAL M/V

ふわふわシフォン大作戦🩷スイーツ戦隊のキラキラミッション✨【銀座コージーコーナー】 #shorts #シフォンケーキ #クリスマスケーキ #クリスマス #ケーキ #チョコケーキ #christmas

ทัวร์สตรีมเมอร์ ROV ชิงเงินรางวัลรวม 25,000 บาท 8 ทีม : รอบ 8 ทีม

#236

Data Science Gems

มุมมอง 88

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 4 ก.พ. 2025
Pre-trained language models are increasingly important components across multiple information retrieval (IR) paradigms. Late interaction, introduced with the ColBERT model and recently refined in ColBERTv2, is a popular paradigm that holds state-of-the-art status across many benchmarks. Performance-optimized Late Interaction Driver (PLAID) dramatically speeds up the search latency of late interaction. Without impacting quality, PLAID swiftly eliminates low-scoring passages using a novel centroid interaction mechanism that treats every passage as a lightweight bag of centroids. PLAID uses centroid interaction as well as centroid pruning, a mechanism for sparsifying the bag of centroids, within a highly-optimized engine to reduce late interaction search latency by up to 7× on a GPU and 45× on a CPU against vanilla ColBERTv2, while continuing to deliver state-of-the-art retrieval quality. This allows the PLAID engine with ColBERTv2 to achieve latency of tens of milliseconds on a GPU and tens or just few hundreds of milliseconds on a CPU at large scale, even at a scale of 140M passages.
In this video, I talk about the following: How does ColBERTv2 work? How does PLAID (Performance-optimized Late Interaction Driver) differ from ColBERTv2? How does PLAID work? How does PLAID perform?
For more details, please look at arxiv.org/pdf/...
Santhanam, Keshav, Omar Khattab, Christopher Potts, and Matei Zaharia. "PLAID: an efficient engine for late interaction retrieval." In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pp. 1747-1756. 2022.

ความคิดเห็น •

ต่อไป

เล่นอัตโนมัติ

#237 MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings

#237 MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Building an AI Chatbot with AI SDK, Hono, TypeScript & More

Building an AI Chatbot with AI SDK, Hono, TypeScript & More

PiXXiE - Pick A Card | OFFICIAL M/V

PiXXiE - Pick A Card | OFFICIAL M/V

ふわふわシフォン大作戦🩷スイーツ戦隊のキラキラミッション✨【銀座コージーコーナー】 #shorts #シフォンケーキ #クリスマスケーキ #クリスマス #ケーキ #チョコケーキ #christmas

ふわふわシフォン大作戦🩷スイーツ戦隊のキラキラミッション✨【銀座コージーコーナー】 #shorts #シフォンケーキ #クリスマスケーキ #クリスマス #ケーキ #チョコケーキ #christmas

ทัวร์สตรีมเมอร์ ROV ชิงเงินรางวัลรวม 25,000 บาท 8 ทีม : รอบ 8 ทีม

ทัวร์สตรีมเมอร์ ROV ชิงเงินรางวัลรวม 25,000 บาท 8 ทีม : รอบ 8 ทีม

ถ้าม้าโดนแกล้งที่โรงเรียน ม้าจะฟ้องครูว่าอะไร #แต้มเซน #การ์ตูน #tamzen #ตลก #shortvideo #การ์ตูน

ถ้าม้าโดนแกล้งที่โรงเรียน ม้าจะฟ้องครูว่าอะไร #แต้มเซน #การ์ตูน #tamzen #ตลก #shortvideo #การ์ตูน

Prof. Geoffrey Hinton - "Will digital intelligence replace biological intelligence?" Romanes Lecture

Prof. Geoffrey Hinton - "Will digital intelligence replace biological intelligence?" Romanes Lecture

GMT20250121 211301 Recording 2880x1800

GMT20250121 211301 Recording 2880x1800

[Webinar] How to Build a Modern Agentic System

[Webinar] How to Build a Modern Agentic System

MIT 6.S191: Recurrent Neural Networks, Transformers, and Attention

MIT 6.S191: Recurrent Neural Networks, Transformers, and Attention

A Really Really Really In-Depth Tesla Model 3 Tour

A Really Really Really In-Depth Tesla Model 3 Tour

Stanford Webinar - Large Language Models Get the Hype, but Compound Systems Are the Future of AI

Stanford Webinar - Large Language Models Get the Hype, but Compound Systems Are the Future of AI

Ilya Sutskever: Sequence to Sequence Learning with Neural Networks at NeurIPS 2024

Ilya Sutskever: Sequence to Sequence Learning with Neural Networks at NeurIPS 2024

A Hackers' Guide to Language Models

A Hackers' Guide to Language Models

#235 ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction

#235 ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction

Mache leckere Lutscher mit diesem PRO-Gadget! 🚽🍭

Mache leckere Lutscher mit diesem PRO-Gadget! 🚽🍭

เดี่ยว - วันที่ได้คำตอบ - Live Show - The Voice Thailand 2024 - 15 Dec 2024

เดี่ยว - วันที่ได้คำตอบ - Live Show - The Voice Thailand 2024 - 15 Dec 2024

มายคราฟแต่ "น้ำกับลาวา" สลับกัน!?

มายคราฟแต่ "น้ำกับลาวา" สลับกัน!?

总算是用上情侣手机壳了 #玩一种很新的东西 #手机壳 #情侣

总算是用上情侣手机壳了 #玩一种很新的东西 #手机壳 #情侣

คริสต์มาสมรณะ | Who Are You EP.7 ( Edwin )

คริสต์มาสมรณะ | Who Are You EP.7 ( Edwin )

Highlight : นายใหญ่ฉุนใคร?

Highlight : นายใหญ่ฉุนใคร?

#เดอะตุ๊ก !! เจาะเดือด ทีมชาติ ผ่าฟอร์ม !! ทีมชาติไทย มันส์ เปิด สาเหตุ !! ระบบ+แท็คติก

#เดอะตุ๊ก !! เจาะเดือด ทีมชาติ ผ่าฟอร์ม !! ทีมชาติไทย มันส์ เปิด สาเหตุ !! ระบบ+แท็คติก

【พากย์ไทย】สาวใช้ในวังจะถูกประหารชีวิต แต่เธอมีฐานะที่ไม่ธรรมดา คือพระราชบุตรีแท้ๆ ของพระราชา!

【พากย์ไทย】สาวใช้ในวังจะถูกประหารชีวิต แต่เธอมีฐานะที่ไม่ธรรมดา คือพระราชบุตรีแท้ๆ ของพระราชา!