RecSys 2020 Tutorial: Feature Engineering for Recommender Systems

แชร์
ฝัง
  • เผยแพร่เมื่อ 30 ม.ค. 2025

ความคิดเห็น •

  • @SeanKearney-g7d
    @SeanKearney-g7d 3 หลายเดือนก่อน

    Awesome talk

  • @trez6465
    @trez6465 2 หลายเดือนก่อน

    Feature Engineering Techniques Used in the Tutorial
    Categorical Feature Encoding:
    Target Encoding: Encodes categorical features using the mean target value per category.
    Smoothing: Reduces overfitting by blending global mean with category mean based on observation count.
    K-Fold Target Encoding: Avoids data leakage by encoding using out-of-fold statistics.
    Categorify: Converts categorical variables into integers, optionally grouping low-frequency categories as "others."
    Combining Features:
    Feature Combination: Creates new features by concatenating values of two or more categorical columns.
    Group-by Aggregations: Calculates counts or other statistics for combined features.
    Numerical Feature Transformations:
    Normalization:
    Standard Scaling: Standardizes features to have a mean of 0 and standard deviation of 1.
    Log Transformation: Normalizes skewed data using logarithmic scaling.
    Min-Max Scaling: Scales features to a 0-1 range.
    Gauss Rank Transformation: Converts arbitrary distributions to a Gaussian distribution.
    Binning: Groups continuous variables into discrete bins, either fixed or category-specific.
    Example: Price binning by quantiles for different product categories.
    Time Series Feature Engineering:
    Rolling Window Features: Aggregates historical data within a specified time window (e.g., 3 days, 7 days).
    Difference Features: Computes differences between current and historical values (e.g., price changes over time).
    Sparse Feature Handling:
    Handling Missing Data: Fills missing categorical values with a placeholder ("unknown") or numerical values with mean/median.
    Low-Frequency Categories: Groups rare categories into a single "other" category.
    Performance Optimization:
    GPU Acceleration:
    Leveraging RAPIDS libraries (cuDF, dask-cuDF) for faster computation on GPUs.
    Distributed Computation:
    Scaling workflows using Dask for parallelism across large datasets.
    Integration with Frameworks:
    NVIDIA NVTabular:
    Streamlines feature engineering pipelines for recommendation systems.
    Provides pre-built operators for feature transformations and supports large-scale data processing.
    Data Loaders:
    Optimized data feeding into training frameworks like TensorFlow, PyTorch, or XGBoost.
    This systematic and GPU-accelerated approach enables fast experimentation and scalability to production-level recommender systems.

  • @chensi3275
    @chensi3275 3 ปีที่แล้ว +10

    Great tutorial. May I know where the source code can be downloaded?

    • @fransdananadeak2524
      @fransdananadeak2524 3 ปีที่แล้ว +1

      Up

    • @silversnow111
      @silversnow111 2 ปีที่แล้ว +3

      It's inside the talk: github.com/rapidsai/deeplearning/tree/main/RecSys2020Tutorial

    • @chensi3275
      @chensi3275 2 ปีที่แล้ว

      @@silversnow111 You are the best. 🙏