Anna Veronika Dorogush: Mastering gradient boosting with CatBoost | PyData London 2019

แชร์
ฝัง
  • เผยแพร่เมื่อ 29 ก.ค. 2024
  • Gradient boosting is a powerful machine-learning technique that achieves state-of-the-art results in a variety of practical tasks. This tutorial will explain details of using gradient boosting in practice, we will solve a classification problem using the popular GBDT library CatBoost.
    www.pydata.org
    PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
    PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.
    0:00 - Introduction
    1:49 - Intro to CatBoost
    2:08 - Overview of the Presentation
    2:39 - Intro to Gradient Boosting
    6:08 - Numerical and Categorical Data with CatBoost
    7:26 - Advantages of CatBoost
    9:00 - Library Comparison (Quality)
    9:45 - Speed
    10:11 - Benchmarking (CPU & GPU)
    11:55 - CPU vs GPU
    12:50 - Prediction Time
    13:24 - Tutorial
    15:15 - Problem Statement
    15:38 - CatBoost Library (Imports and related issues)
    16:22 - Reading and Intro to the Data
    18:17 - Exploring the data
    19:36 - Training the Model with default parameters
    22:16 - Creating the Pool Object
    23:12 - Splitting the data (Train & Validation)
    24:16 - Selecting the objective function
    25:11 - STDOUT of training
    28:32 - Plotting metrics while training
    30:33 - Model Comparison (plotting after training)
    32:39 - Finding the best model
    35:05 - Cross-Validation
    41:30 - Grid Search
    44:40 - Overfitting Detector
    49:18 - Overfitting Detector with eval metric
    51:31 - Model Predictions
    57:10 - Select Decision Boundary
    1:01:04 - Model Evaluation (new dataset)
    1:03:06 - Feature Importance
    1:03:37 - Prediction Values Change
    1:04:50 - Loss Function Change
    1:07:49 - Shap Values
    1:16:05 - Snapshotting
    1:17:45 - Saving the Model
    1:18:36 - Hyperparameters Tuning
    1:23:07 - Speeding up Training and Reducing Model Size
    1:23:35 - Additional Details about CatBoost Community
    1:25:50 - Future Scope of CatBoost
    1:26:22 - Questions and Suggestions
    S/o to github.com/theProcrastinatr for the video timestamps!
    Want to help add timestamps to our TH-cam videos to help with discoverability? Find out more here: github.com/numfocus/TH-camVi...
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 15

  • @ihgnmah
    @ihgnmah 2 ปีที่แล้ว +14

    02:39 Intro to Gradient Boosting
    07:25 Catboost Advantages
    14:47 Tutorial Starts
    19:37 Training The First Model
    22:16 Working with Pool - Catboost's Data Container
    24:16 Objective Function & Standard Output
    28:41 Metrics & Plotting
    30:37 Model Comparison & Best Iteration
    35:03 Cross-Validation
    41:30 Using Catboost with Sklearn's Grid Search
    44:42 Overfitting Detector
    51:29 Making Prediction with Catboost
    56:49 Select Decision Boundary
    01:01:03 Metric Evaluation On New Data Sets
    01:03:05 Feature Importance
    01:16:06 Snapshotting & Saving Model
    01:18:36 Hyperparameter Tunning
    01:23:35 Outtro
    01:26:22 Q&A

  • @seant7907
    @seant7907 2 ปีที่แล้ว +1

    Thank you for uploading this video. I think it's quite useful for beginners to understand more about modeling using boosted models.

  • @roeiamos4491
    @roeiamos4491 2 ปีที่แล้ว

    Great video, simplified the completed training flow with CatBoost and many cool features in this library. Thanks :)

  • @braudelan
    @braudelan 2 ปีที่แล้ว +4

    thanks for the video, nice presentation of catboost. It'd great if you could post the jupyter notebool as well.

    • @parousiathelast9464
      @parousiathelast9464 2 ปีที่แล้ว +2

      Yes, the links shown in the start of the video don't seem to work any more. A refresh to working links would be great.

  • @hamedkalantari9589
    @hamedkalantari9589 2 หลายเดือนก่อน

    Thanks. It's useful

  • @pomborlz
    @pomborlz ปีที่แล้ว +2

    The notebook link doesn't work anymore 😢

  • @masster_yoda
    @masster_yoda ปีที่แล้ว

    Wonderful talk!

  • @petroskoulouris3225
    @petroskoulouris3225 2 ปีที่แล้ว +3

    Does anyone have the link to code?

  • @belikk1986
    @belikk1986 ปีที่แล้ว

    Thankk you Anna!

  • @RAHUDAS
    @RAHUDAS ปีที่แล้ว +1

    where is the notebook ??

  • @wexwexexort
    @wexwexexort 4 หลายเดือนก่อน

    I'm in love.

  • @antonseledkov1536
    @antonseledkov1536 ปีที่แล้ว +1

    Так Аня или Вера?