81
78 099

12:56

You want to be in control of your own Copilot with Ty Dunn - founder of Continue.dev

1:07:17

What it is like to maintain the scikit-learn docs with David Arturo Amor Quiroz, docs maintainer

55:02

Sqlite can totally do embeddings now with Alex Garcia, creator of sqlite-vec

59:21

How to rethink the notebook with Akshay Agrawal, co-creator of Marimo

1:12:05

Topics vs. embeddings

9:00

Why the MinHashEncoder is great for boosted trees

Boosted tree models don't support sparse matrices, which might make you think they have trouble encoding text data. There are, however, encoding techniques that can work great without resorting to sparse methods. The MinHash encoder is one such technique and this video explains why it is a great choice for many pipelines.
00:00 Introduction
01:03 Hashing text
05:16 Plotting hashes
06:22 Code demo
09:46 Pipelines
The notebooks for the dirty category series can be found here:
github.com/probabl-ai/youtube-appendix/tree/main/15-dirtycat
Website: probabl.ai/
LinkedIn: www.linkedin.com/company/probabl
Twitter: x.com/probabl_ai
Discord: discord.probabl.ai
We also host a podcast called Sample Space, which you can find on your favourite podcast player. All the links can be found here:
rss.com/podcasts/sample-space/

มุมมอง: 678

วีดีโอ

12:56

How the HashingVectorizer works

มุมมอง 379วันที่ผ่านมา

You can use the CountVectorizer in scikit-learn to encode text to a sparse array that a machine learning model can use. This functionality is great, but it can result in *huge* widths. An alternative to this is the HashingVectorizer, which we discuss in this video. The notebooks for the dirty category series can be found here: github.com/probabl-ai/youtube-appendix/tree/main/15-dirtycat Website...

You want to be in control of your own Copilot with Ty Dunn - founder of Continue.dev

1:07:17

You want to be in control of your own Copilot with Ty Dunn - founder of Continue.dev

มุมมอง 25114 วันที่ผ่านมา

There are many LLMs that you can use for programming these days. Some of them even go into your IDE like Cursor or Github Copilot. But what if you want to tweak these LLMs do to what you want? Instead of being stuck with the tools that a vendor gives you, the goal of Continue.dev is to allow you to customise this yourself. In this podcast we talk to Ty Dunn, co-founder of the project to learn m...

What it is like to maintain the scikit-learn docs with David Arturo Amor Quiroz, docs maintainer

55:02

What it is like to maintain the scikit-learn docs with David Arturo Amor Quiroz, docs maintainer

มุมมอง 34121 วันที่ผ่านมา

Scikit-learn's documentation pages are celebrated. But not everyone is aware that the project actually has somebody on payroll to take care of it. In this episode we talk to Arturo about stories from the scikit-learn documentation. In particular, the docs have a recommender that few folks are aware of. People just assume that it is manually curated, but there are a few base scikit-learn tools u...

Sqlite can totally do embeddings now with Alex Garcia, creator of sqlite-vec

59:21

Sqlite can totally do embeddings now with Alex Garcia, creator of sqlite-vec

มุมมอง 1.1Kหลายเดือนก่อน

Sqlite can totally do embeddings now with Alex Garcia, creator of sqlite-vec

How to rethink the notebook with Akshay Agrawal, co-creator of Marimo

1:12:05

How to rethink the notebook with Akshay Agrawal, co-creator of Marimo

มุมมอง 743หลายเดือนก่อน

How to rethink the notebook with Akshay Agrawal, co-creator of Marimo

9:00

Topics vs. embeddings

มุมมอง 775หลายเดือนก่อน

Topics vs. embeddings

11:39

How the GapEncoder works

มุมมอง 804หลายเดือนก่อน

How the GapEncoder works

11:14

PCA as an embedding technique

มุมมอง 1.2K2 หลายเดือนก่อน

PCA as an embedding technique

Feature engineering for overlapping categories

12:08

Feature engineering for overlapping categories

มุมมอง 7332 หลายเดือนก่อน

Feature engineering for overlapping categories

You're always (always!) dealing with many (many!) tables - with Madelon Hulsebos

1:09:11

You're always (always!) dealing with many (many!) tables - with Madelon Hulsebos

มุมมอง 8142 หลายเดือนก่อน

You're always (always!) dealing with many (many!) tables - with Madelon Hulsebos

11:51

Data checks for estimators

มุมมอง 4882 หลายเดือนก่อน

Data checks for estimators

11:22

Improving models via subsets

มุมมอง 6302 หลายเดือนก่อน

Improving models via subsets

How Narwhals has many end users ... that never use it directly. - Marco Gorelli

1:00:54

How Narwhals has many end users ... that never use it directly. - Marco Gorelli

มุมมอง 5893 หลายเดือนก่อน

How Narwhals has many end users ... that never use it directly. - Marco Gorelli

10:13

Decayed estimators for timeseries

มุมมอง 1.8K3 หลายเดือนก่อน

Decayed estimators for timeseries

12:58

More flexible models via sample weights

มุมมอง 7843 หลายเดือนก่อน

More flexible models via sample weights

Why ridge regression typically beats linear regression

12:47

Why ridge regression typically beats linear regression

มุมมอง 1.4K3 หลายเดือนก่อน

Why ridge regression typically beats linear regression

Understanding how the KernelDensityEstimator works

12:15

Understanding how the KernelDensityEstimator works

มุมมอง 8084 หลายเดือนก่อน

Understanding how the KernelDensityEstimator works

Pragmatic data science checklists with Peter Bull

1:05:39

Pragmatic data science checklists with Peter Bull

มุมมอง 9234 หลายเดือนก่อน

Pragmatic data science checklists with Peter Bull

13:15

Use-cases for inverted PCA

มุมมอง 1.8K4 หลายเดือนก่อน

Use-cases for inverted PCA

10:48

Don't worry too much about missing data

มุมมอง 9474 หลายเดือนก่อน

Don't worry too much about missing data

Model safety, that's a pickle! with Adrin Jalali - scikit-learn maintainer

1:01:48

Model safety, that's a pickle! with Adrin Jalali - scikit-learn maintainer

มุมมอง 4204 หลายเดือนก่อน

Model safety, that's a pickle! with Adrin Jalali - scikit-learn maintainer

11:49

Boosting vs. semi-supervised learning

มุมมอง 2.4K5 หลายเดือนก่อน

Boosting vs. semi-supervised learning

Benchmarking boosted trees against overfitting

14:16

Benchmarking boosted trees against overfitting

มุมมอง 5495 หลายเดือนก่อน

Benchmarking boosted trees against overfitting

12:43

Monotonic, and better, boosting

มุมมอง 9145 หลายเดือนก่อน

Monotonic, and better, boosting

Moving towards KDearestNeighbors with Leland McInnes - creator of UMAP

57:20

Moving towards KDearestNeighbors with Leland McInnes - creator of UMAP

มุมมอง 1.2K5 หลายเดือนก่อน

Moving towards KDearestNeighbors with Leland McInnes - creator of UMAP

10:24

Histograms for faster boosting

มุมมอง 1.2K6 หลายเดือนก่อน

Histograms for faster boosting

12:24

Getting deeper into trees

มุมมอง 1.2K6 หลายเดือนก่อน

Getting deeper into trees

11:16

Why tree gradients give you a boost

มุมมอง 2.5K6 หลายเดือนก่อน

Why tree gradients give you a boost

Talk like a DataFrame, run like SQL with Phillip Cloud - core-committer on Ibis

1:04:10

Talk like a DataFrame, run like SQL with Phillip Cloud - core-committer on Ibis

มุมมอง 5416 หลายเดือนก่อน

Talk like a DataFrame, run like SQL with Phillip Cloud - core-committer on Ibis