Sease
Sease
  • 36
  • 8 579
Blazing-Fast Serverless MapReduce Indexer for Apache Solr | Daniele Antuzi
Welcome to the 22nd London Information Retrieval & AI Meetup, a free evening event aimed at Information Retrieval and AI enthusiasts and professionals who are curious to explore and discuss the latest trends in the field.
The second talk is from Daniele Antuzi, R&D Software Engineer @ Sease.
Title: "Blazing-Fast Serverless MapReduce Indexer for Apache Solr"
Abstract: "Indexing data from databases to Apache Solr has always been an open problem: for a while, the data import handler was used even if it was not recommended for production environments. Traditional indexing processes often encounter scalability challenges, especially with large datasets.
In this talk, we explore the architecture and implementation of a serverless MapReduce indexer designed for Apache Solr but extendable to any search engine. By embracing a serverless approach, we can take advantage of the elasticity and scalability offered by cloud services like AWS Lambda, enabling efficient indexing without needing to manage infrastructure.
We dig into the principles of MapReduce, a programming model for processing large datasets, and discuss how it can be adapted for indexing documents into Apache Solr. Using AWS Step Functions to orchestrate multiple Lambdas, we demonstrate how to distribute indexing tasks across multiple resources, achieving parallel processing and significantly reducing indexing times.
Through practical examples, we address key considerations such as data partitioning, fault tolerance, concurrency, and cost.
We also cover integration points with other AWS services such as Amazon S3 for data storage and retrieval, as well as DynamoDB for distributed lock between the lambda instances."
If you are willing to attend the next London Information Retrieval & AI Meetup, don't forget to join our group: bit.ly/2IjSBeX
We are also accepting talks for the next meetups. If you have any talk you would like to propose, feel free to send us an abstract at talk@sease.io.
**********************************
Hosted by Sease: sease.io
มุมมอง: 35

วีดีโอ

Towards Standardization of Privacy-Preserving IR: Decentralized Algorithms with User-Centric Design
มุมมอง 3628 วันที่ผ่านมา
Welcome to the 22nd London Information Retrieval & AI Meetup, a free evening event aimed at Information Retrieval and AI enthusiasts and professionals who are curious to explore and discuss the latest trends in the field. The first talk is from Mohammad Bahrani, Research Associate @ University of Southampton. Title: "Towards Standardization of Privacy-Preserving IR: Decentralized Algorithms wit...
Hybrid Search With Apache Solr Reciprocal Rank Fusion | Alessandro Benedetti
มุมมอง 903 หลายเดือนก่อน
Welcome to the twenty-first London Information Retrieval & AI Meetup, a free evening event aimed at Information Retrieval and AI enthusiasts and professionals who are curious to explore and discuss the latest trends in the field. The second talk is from Alessandro Benedetti, Director at Sease. Title: "Hybrid Search With Apache Solr Reciprocal Rank Fusion" Abstract: "Vector-based search gained i...
Search UX SUX | Mark Harwood
มุมมอง 633 หลายเดือนก่อน
Welcome to the twenty-first London Information Retrieval & AI Meetup, a free evening event aimed at Information Retrieval enthusiasts and professionals who are curious to explore and discuss the latest trends in the field. The first talk is from Mark Harwood, Independent Engineer (ex Elastic) Title: "Search UX SUX: how we fail to put search engine functionality in the hands of users and what to...
Word Embeddings Compression for Neural Language Models | Amit Kumar Jaiswal
มุมมอง 673 หลายเดือนก่อน
Welcome to the twentieth London Information Retrieval Meetup, a free evening event aimed at Information Retrieval enthusiasts and professionals who are curious to explore and discuss the latest trends in the field. The second talk is from Amit Kumar Jaiswal, Postdoc at the University of Surrey and Honorary Research Fellow, UCL Title: "Word Embeddings Compression for Neural Language Models" Abst...
Large Language Models for Information Extraction and Information Retrieval
มุมมอง 2013 หลายเดือนก่อน
Welcome to the twentieth London Information Retrieval Meetup, a free evening event aimed at Information Retrieval enthusiasts and professionals who are curious to explore and discuss the latest trends in the field. The first talk is from Stuart Middleton, Associate Professor @ the University of Southampton Title: "Large Language Models for Information Extraction and Information Retrieval" Abstr...
Dataset Discovery in Data Spaces - Luis-Daniel Ibáñez
มุมมอง 405 หลายเดือนก่อน
Welcome to the nineteenth London Information Retrieval Meetup, a free evening event aimed at Information Retrieval enthusiasts and professionals who are curious to explore and discuss the latest trends in the field. The second talk is from Luis-Daniel Ibáñez, Lecturer in the Web and Internet Science team @ University of Southampton Title: "Dataset Discovery in Data Spaces" Abstract: "A common t...
User-Centred Interactive Information Access | Haiming Liu
มุมมอง 327 หลายเดือนก่อน
Welcome to the nineteenth London Information Retrieval Meetup, a free evening event aimed at Information Retrieval enthusiasts and professionals who are curious to explore and discuss the latest trends in the field. The first talk is from Haiming Liu, Director at the CMI (Centre for Machine Intelligence). Title: "User-Centred Interactive Information Access" Abstract: "Human-centric information ...
A Deep Dive Into Personalized Information Retrieval | Pranav Kasela
มุมมอง 13610 หลายเดือนก่อน
Welcome to the eighteenth London Information Retrieval Meetup, a free evening event aimed at Information Retrieval enthusiasts and professionals who are curious to explore and discuss the latest trends in the field. The second talk is from Pranav Kasela, PhD Student at the University of Milano-Bicocca. Title: "A Deep Dive Into Personalized Information Retrieval" Abstract: "Personalization in In...
GenerativeAI with Apache Solr and LangStream.ai | Enrico Olivelli
มุมมอง 1Kปีที่แล้ว
Welcome to the eighteenth London Information Retrieval Meetup, a free evening event aimed at Information Retrieval enthusiasts and professionals who are curious to explore and discuss the latest trends in the field. The second talk is from Enrico Olivelli, Senior Software Engineer at Datastax. Title: "GenerativeAI with Apache Solr and LangStream.ai" Abstract: "Everybody is talking about Generat...
Open Source Large Language Models in Search | Alessandro Benedetti
มุมมอง 261ปีที่แล้ว
Welcome to the eighteenth London Information Retrieval Meetup, a free evening event aimed at Information Retrieval enthusiasts and professionals who are curious to explore and discuss the latest trends in the field. The first talk is from Alessandro Benedetti, Director @seaseltd Title: "Open Source Large Language Models in Search" Abstract: "Large Language Models (LLMs) are becoming ubiquitous:...
Autoregressive Knowledge Retrieval | Fabio Petroni
มุมมอง 186ปีที่แล้ว
Welcome to our seventeenth London Information Retrieval Meetup, a free evening meetup aimed at Information Retrieval passionates and professionals who are curious to explore and discuss the latest trends in the field. The first talk of this meetup was "Autoregressive Knowledge Retrieval" by Fabio Petroni, Co-Founder & CTO at Samaya AI. If you are willing to attend the next London Information Re...
Word2Vec Model To Generate Synonyms on the Fly in Apache Lucene: Story of a Contribution
มุมมอง 107ปีที่แล้ว
Welcome to our seventeenth London Information Retrieval Meetup, a free evening meetup aimed at Information Retrieval passionates and professionals who are curious to explore and discuss the latest trends in the field. The first talk of this meetup was "Word2Vec Model To Generate Synonyms on the Fly in Apache Lucene: Story of a Contribution" by Daniele Antuzi, R&D Software Engineer at Sease. If ...
How #ChatGPT works: an #InformationRetrieval Perspective | Alessandro Benedetti - Director at Sease
มุมมอง 139ปีที่แล้ว
Welcome to our sixteenth London Information Retrieval Meetup, a free evening meetup aimed at Information Retrieval passionates and professionals who are curious to explore and discuss the latest trends in the field. The first talk of this meetup was "How ChatGPT works: an Information Retrieval Perspective" by Alessandro Benedetti, Director at Sease. If you are willing to attend the next London ...
Retrieving Key Events From 1 Billion News Stories | Daniel Staff
มุมมอง 89ปีที่แล้ว
Welcome to our fifteenth London Information Retrieval Meetup, a free evening meetup aimed at Information Retrieval passionates and professionals who are curious to explore and discuss the latest trends in the field. The second talk of this meetup was "Retrieving Key Events From 1 Billion News Stories " by Daniel Staff, Senior Data Scientist at Signal AI. If you are willing to attend the next Lo...
How to Implement Your Online Search Quality Evaluation With Kibana | Anna Ruggero + Ilaria Petreti
มุมมอง 862 ปีที่แล้ว
How to Implement Your Online Search Quality Evaluation With Kibana | Anna Ruggero Ilaria Petreti
[ITA] Share-VDE: indicizzazione su vasta scala
มุมมอง 712 ปีที่แล้ว
[ITA] Share-VDE: indicizzazione su vasta scala
[ITA] La Ricerca Neurale arriva in Apache Solr: Approximate Nearest Neighbor, BERT e altro ancora!
มุมมอง 1752 ปีที่แล้ว
[ITA] La Ricerca Neurale arriva in Apache Solr: Approximate Nearest Neighbor, BERT e altro ancora!
Players in Vector Search: algorithms, software and use cases - Dmitry Kan
มุมมอง 3.6K2 ปีที่แล้ว
Players in Vector Search: algorithms, software and use cases - Dmitry Kan
Building an open-source online Learn-to-rank engine | Roman Grebennikov + Vsevolod Goloviznin
มุมมอง 1052 ปีที่แล้ว
Building an open-source online Learn-to-rank engine | Roman Grebennikov Vsevolod Goloviznin
How to cache your searches: an open-source implementation | Daniele Antuzi
มุมมอง 652 ปีที่แล้ว
How to cache your searches: an open-source implementation | Daniele Antuzi
Taking the neural search paradigm shift to production
มุมมอง 4092 ปีที่แล้ว
Taking the neural search paradigm shift to production
Query Performance Prediction by Means of Intent-Aware Metrics in Systematic Reviews
มุมมอง 1063 ปีที่แล้ว
Query Performance Prediction by Means of Intent-Aware Metrics in Systematic Reviews
Rated Ranking Evaluator Enterprise: the next generation of free Search Quality Evaluation Tools
มุมมอง 1533 ปีที่แล้ว
Rated Ranking Evaluator Enterprise: the next generation of free Search Quality Evaluation Tools
Gathering Multiple Ratings with Quepid | Eric Pugh + Dmitry Kan
มุมมอง 953 ปีที่แล้ว
Gathering Multiple Ratings with Quepid | Eric Pugh Dmitry Kan
Lucene-grep | Dainius Jocas - Staff Engineer at Vinted
มุมมอง 1343 ปีที่แล้ว
Lucene-grep | Dainius Jocas - Staff Engineer at Vinted
The Split Between Apache Lucene and Apache Solr - Jan Høydal and Michael Sokolov
มุมมอง 2423 ปีที่แล้ว
The Split Between Apache Lucene and Apache Solr - Jan Høydal and Michael Sokolov
Explainability for Learning to Rank - Ilaria Petreti (IR/ML Engineer) & Anna Ruggero (R&D Engineer)
มุมมอง 2523 ปีที่แล้ว
Explainability for Learning to Rank - Ilaria Petreti (IR/ML Engineer) & Anna Ruggero (R&D Engineer)
Interactive Q&A: Learning to Rank Libraries
มุมมอง 693 ปีที่แล้ว
Interactive Q&A: Learning to Rank Libraries
Interactive Q&A: Natural Language Search, Language Modelling and Neural Search
มุมมอง 533 ปีที่แล้ว
Interactive Q&A: Natural Language Search, Language Modelling and Neural Search

ความคิดเห็น

  • @kagan770
    @kagan770 7 หลายเดือนก่อน

    excellent talk on search quality, only wish I'd get to it about 3 months earlier - cause it was related to my direct work.

  • @JuanUys
    @JuanUys ปีที่แล้ว

    Possible to please link the slides in the description?

  • @venkat1195
    @venkat1195 ปีที่แล้ว

    amazing! Thank you for sharing this knowledge!

  • @billykotsos4642
    @billykotsos4642 2 ปีที่แล้ว

    kickass presentation

  • @VectorPodcast
    @VectorPodcast 2 ปีที่แล้ว

    Thanks for having me! // Dmitry

  • @jonathanm.3253
    @jonathanm.3253 3 ปีที่แล้ว

    Hello, great video! Is the code publicly available somewhere?

    • @ilariapetreti6592
      @ilariapetreti6592 3 ปีที่แล้ว

      Hi Jonathan! Thank you... You can find the repository of the project here: github.com/SeaseLtd/learning-to-rank-spotify The code related to this talk is in 'part1' folder. If you are curious, there are also references to the following parts of the project

    • @jonathanm.3253
      @jonathanm.3253 3 ปีที่แล้ว

      Thank you :)