Hi Leta, sorry I didn't get back to you sooner. I looked into it and it looks like Spotify has a pretty nice API for getting data from their servers. Here is a full list of the kind of data you can get: developer.spotify.com/documentation/web-api/reference/object-model/
Outliers in categorical data can be found just by checking df[column].value_counts() You can interpret the results yourself and decide if you should remove the values that occur infrequently. To remove the occurrences of a specific value, first get the indices for those examples: indices_to_drop = df[df[column] == ].index Then drop the indices: df = df.drop(indices_to_drop, axis=0)
How do you think z scores compare in enhancing performance versus compressive scalers like log based? Surely scaler approach is a lot less work. Your tutorial was instructive but perhaps to much so I now have a severe headache :)
Hahaha, sorry if it was hard to follow! I am not sure if there is a hard and fast rule about this. I think log-based scalers are probably fine, but they don't truly remove the outliers. I would use the two methods on a case by case basis, and see in practice which method improves performance the most.
Great video! Keep it up
Thanks, Arthur!
Great tutorial thanks for sharing
Hi,
How can we remove multivariate outliers? please help on this.
Please can you do how to extract original data(attributes ) from spotify for popularity prediction?
Thanks for the tutorial!
Hi Leta, what do you mean by extract? Do you mean actually scraping the data from Spotify's servers?
@@gcdatkin Yes!
Hi Leta, sorry I didn't get back to you sooner. I looked into it and it looks like Spotify has a pretty nice API for getting data from their servers. Here is a full list of the kind of data you can get:
developer.spotify.com/documentation/web-api/reference/object-model/
Nice vedio but I have a question can we handle outliers in categorical data if yes so how we handle any solution you have ?
Outliers in categorical data can be found just by checking
df[column].value_counts()
You can interpret the results yourself and decide if you should remove the values that occur infrequently.
To remove the occurrences of a specific value, first get the indices for those examples:
indices_to_drop = df[df[column] == ].index
Then drop the indices:
df = df.drop(indices_to_drop, axis=0)
How do you think z scores compare in enhancing performance versus compressive scalers like log based? Surely scaler approach is a lot less work.
Your tutorial was instructive but perhaps to much so I now have a severe headache :)
Hahaha, sorry if it was hard to follow!
I am not sure if there is a hard and fast rule about this. I think log-based scalers are probably fine, but they don't truly remove the outliers.
I would use the two methods on a case by case basis, and see in practice which method improves performance the most.
Is there anything I can help clarify about the video?
@@gcdatkin No it is fine. Just a lot to absorb early in the morning