How to Remove Outliers From Data (Spotify Song Popularity Prediction) - Data Every Day

แชร์
ฝัง
  • เผยแพร่เมื่อ 3 ก.พ. 2025

ความคิดเห็น • 14

  • @arthurrey3494
    @arthurrey3494 4 ปีที่แล้ว +2

    Great video! Keep it up

    • @gcdatkin
      @gcdatkin  4 ปีที่แล้ว +1

      Thanks, Arthur!

  • @Mohamm-ed
    @Mohamm-ed 4 ปีที่แล้ว +1

    Great tutorial thanks for sharing

  • @guptaaman411
    @guptaaman411 3 ปีที่แล้ว

    Hi,
    How can we remove multivariate outliers? please help on this.

  • @letaendashew3466
    @letaendashew3466 4 ปีที่แล้ว +1

    Please can you do how to extract original data(attributes ) from spotify for popularity prediction?
    Thanks for the tutorial!

    • @gcdatkin
      @gcdatkin  4 ปีที่แล้ว

      Hi Leta, what do you mean by extract? Do you mean actually scraping the data from Spotify's servers?

    • @letaendashew3466
      @letaendashew3466 4 ปีที่แล้ว

      @@gcdatkin Yes!

    • @gcdatkin
      @gcdatkin  4 ปีที่แล้ว +1

      Hi Leta, sorry I didn't get back to you sooner. I looked into it and it looks like Spotify has a pretty nice API for getting data from their servers. Here is a full list of the kind of data you can get:
      developer.spotify.com/documentation/web-api/reference/object-model/

  • @adarshjamwal3448
    @adarshjamwal3448 3 ปีที่แล้ว

    Nice vedio but I have a question can we handle outliers in categorical data if yes so how we handle any solution you have ?

    • @gcdatkin
      @gcdatkin  3 ปีที่แล้ว +1

      Outliers in categorical data can be found just by checking
      df[column].value_counts()
      You can interpret the results yourself and decide if you should remove the values that occur infrequently.
      To remove the occurrences of a specific value, first get the indices for those examples:
      indices_to_drop = df[df[column] == ].index
      Then drop the indices:
      df = df.drop(indices_to_drop, axis=0)

  • @jamespaladin607
    @jamespaladin607 4 ปีที่แล้ว

    How do you think z scores compare in enhancing performance versus compressive scalers like log based? Surely scaler approach is a lot less work.
    Your tutorial was instructive but perhaps to much so I now have a severe headache :)

    • @gcdatkin
      @gcdatkin  4 ปีที่แล้ว

      Hahaha, sorry if it was hard to follow!
      I am not sure if there is a hard and fast rule about this. I think log-based scalers are probably fine, but they don't truly remove the outliers.
      I would use the two methods on a case by case basis, and see in practice which method improves performance the most.

    • @gcdatkin
      @gcdatkin  4 ปีที่แล้ว

      Is there anything I can help clarify about the video?

    • @jamespaladin607
      @jamespaladin607 4 ปีที่แล้ว +1

      @@gcdatkin No it is fine. Just a lot to absorb early in the morning