Anna Veronika Dorogush: Mastering gradient boosting with CatBoost | PyData London 2019
ฝัง
- เผยแพร่เมื่อ 29 ก.ค. 2024
- Gradient boosting is a powerful machine-learning technique that achieves state-of-the-art results in a variety of practical tasks. This tutorial will explain details of using gradient boosting in practice, we will solve a classification problem using the popular GBDT library CatBoost.
www.pydata.org
PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.
0:00 - Introduction
1:49 - Intro to CatBoost
2:08 - Overview of the Presentation
2:39 - Intro to Gradient Boosting
6:08 - Numerical and Categorical Data with CatBoost
7:26 - Advantages of CatBoost
9:00 - Library Comparison (Quality)
9:45 - Speed
10:11 - Benchmarking (CPU & GPU)
11:55 - CPU vs GPU
12:50 - Prediction Time
13:24 - Tutorial
15:15 - Problem Statement
15:38 - CatBoost Library (Imports and related issues)
16:22 - Reading and Intro to the Data
18:17 - Exploring the data
19:36 - Training the Model with default parameters
22:16 - Creating the Pool Object
23:12 - Splitting the data (Train & Validation)
24:16 - Selecting the objective function
25:11 - STDOUT of training
28:32 - Plotting metrics while training
30:33 - Model Comparison (plotting after training)
32:39 - Finding the best model
35:05 - Cross-Validation
41:30 - Grid Search
44:40 - Overfitting Detector
49:18 - Overfitting Detector with eval metric
51:31 - Model Predictions
57:10 - Select Decision Boundary
1:01:04 - Model Evaluation (new dataset)
1:03:06 - Feature Importance
1:03:37 - Prediction Values Change
1:04:50 - Loss Function Change
1:07:49 - Shap Values
1:16:05 - Snapshotting
1:17:45 - Saving the Model
1:18:36 - Hyperparameters Tuning
1:23:07 - Speeding up Training and Reducing Model Size
1:23:35 - Additional Details about CatBoost Community
1:25:50 - Future Scope of CatBoost
1:26:22 - Questions and Suggestions
S/o to github.com/theProcrastinatr for the video timestamps!
Want to help add timestamps to our TH-cam videos to help with discoverability? Find out more here: github.com/numfocus/TH-camVi... - วิทยาศาสตร์และเทคโนโลยี
02:39 Intro to Gradient Boosting
07:25 Catboost Advantages
14:47 Tutorial Starts
19:37 Training The First Model
22:16 Working with Pool - Catboost's Data Container
24:16 Objective Function & Standard Output
28:41 Metrics & Plotting
30:37 Model Comparison & Best Iteration
35:03 Cross-Validation
41:30 Using Catboost with Sklearn's Grid Search
44:42 Overfitting Detector
51:29 Making Prediction with Catboost
56:49 Select Decision Boundary
01:01:03 Metric Evaluation On New Data Sets
01:03:05 Feature Importance
01:16:06 Snapshotting & Saving Model
01:18:36 Hyperparameter Tunning
01:23:35 Outtro
01:26:22 Q&A
Х
Х
Thank you for uploading this video. I think it's quite useful for beginners to understand more about modeling using boosted models.
Great video, simplified the completed training flow with CatBoost and many cool features in this library. Thanks :)
thanks for the video, nice presentation of catboost. It'd great if you could post the jupyter notebool as well.
Yes, the links shown in the start of the video don't seem to work any more. A refresh to working links would be great.
Thanks. It's useful
The notebook link doesn't work anymore 😢
Wonderful talk!
Does anyone have the link to code?
Thankk you Anna!
where is the notebook ??
I'm in love.
Так Аня или Вера?