Real-Time ML Model Monitoring with Datasketches and Apache Pinot at Uber | RTA Summit 2024

แชร์
ฝัง
  • เผยแพร่เมื่อ 6 ก.พ. 2025
  • Join: stree.ai/slack | Sub: stree.ai/sub | In this tech talk we would like to present an overview of how, at Uber, we leverage Pinot as a datasketch store for ML model monitoring use case. We also would like to highlight the power of Apache Pinot's aggregation system and capabilities of recently introduced datasketch functions.
    Problem and solution: Machine Learning life cycle operates with large-scale data across different storage systems, both online and offline. ML practitioners need to monitor and debug the data quality and distribution for each data set and also cross-compare different data sources. However, it is challenging to have a monitoring application that directly works with raw data from different systems. As a solution, we do data profiling for those datasets, and store the profiling results as data sketches inside Pinot. We leverage Pinot to achieve scalable storage and low-latency queries for data sketches so that we can enable both continuous ML monitoring and adhoc UI based debugging experience. To enable this, we integrated Apache Datasketches into Pinot, particularly focusing on KLL, CPC and Frequent Items sketches, and it clearly shows performance advantages compared to alternative storage solutions like Druid and MySQL.
    Join us for a deep dive into optimizing real-time analytics in ML applications with Apache Pinot.
    LEARN MORE
    ► Learn about Apache Pinot: dev.startree.ai
    CONNECT
    Subscribe on TH-cam: th-cam.com/users/c...
    Community Slack: stree.ai/slack
    Twitter: / startreedata
    Linkedin: / startreedata
    GitHub: github.com/sta...
    Site: startree.ai
    ABOUT STARTREE
    StarTree, powered by Apache Pinot: real-time analytics for user-facing apps. Discover the latest technical developments, trends, and use cases for real-time analytics straight from our engineers and open source developers, and your fellow industry colleagues and practitioners.
    StarTree is uniquely positioned to bring you fast, reliable and fresh perspectives on the shift from batch to real-time analytics, and from internal-facing to user-facing applications. Subscribe to our channel if you are designing systems that need to scale to petabytes of information, millions of users, and to provide results at sub-second scale latencies.
    #apachepinot #realtimeanalytics #startree

ความคิดเห็น •