This was a great lecture with very surprising results. I had always assumed that overfit large models would always be worse to use since the training data was finite. Never thought the opposite was the case. Great work.
@@brhnkh From what I understand from the lecture, one of the key insights is that larger models should be able to perform better out of sample. The general problem with using ML for time series data is that they are easy to overfit due to the limited training set and feature representation. However with a diverse feature set, larger models should be able to generalize better which is intuitive to me.The reason why many experts would recommend against bigger models is because they are much easier to overfit to the training data but this may just because they were training with a small feature set that did not have many predictive outcomes. Whenever this is the case, a model will learn the dataset itself in order to achieve the reward without finding patterns that can be repeated in out of sample data.
@@Khari99 Right. The last sentence is what's novel I think. The model gets better at out-of-sample data because it is appropriately penalized while being trained with very large number of parameters. Apparently, they figured it out using random matrix theory, so the heavy lifting lies there I guess.
@@brhnkh the reward function is always the most difficult part of ML. It took me a while to figure out how to write mine for my architecture. Simply using max profit is not enough (a model could learn to buy and hold forever for instance) And neither is accuracy. (high accuracy != profitability). You have to reward it and penalize it in a similar way that you would a human based on metrics its able to achieve on a trade by trade and portfolio basis.
@@brhnkh I thought a definition of overfitting was when validation error starts to increase rapidly, after reaching its minimum during training, whilst training error continues to decrease. It is therefore not clear to me, why you would want a model to overfit at all, finance or not. I'm only 4 minutes into the video, perhaps he will explain it.
Quantitative Finance is really exhausting; many different authors, books, and articles contradict one another. Complexity is usually viewed with disapproval by the industry.
You misunderstood my point. I know it’s different from SVM I said the principles are the same. You need to read more especially read the works by Vapnik on the fundamentals of machine learning. SVM like DNN achieves zero training error but achieves good out of sample performance this due to controlling the learning capacity of the machine via regularization
@@traveleurope5756@traveleurope5756 Well, that is completely different from your initial statement and actually insightful. So you are saying that the same is achieved here through ridgeless regression?
This video is just so indulging. Couldn't understand a few things but great
This was a great lecture with very surprising results. I had always assumed that overfit large models would always be worse to use since the training data was finite. Never thought the opposite was the case. Great work.
Am I missing something or is it just "ridgeless" regression with appropriate penalty (z) being really good?
@@brhnkh From what I understand from the lecture, one of the key insights is that larger models should be able to perform better out of sample. The general problem with using ML for time series data is that they are easy to overfit due to the limited training set and feature representation. However with a diverse feature set, larger models should be able to generalize better which is intuitive to me.The reason why many experts would recommend against bigger models is because they are much easier to overfit to the training data but this may just because they were training with a small feature set that did not have many predictive outcomes. Whenever this is the case, a model will learn the dataset itself in order to achieve the reward without finding patterns that can be repeated in out of sample data.
@@Khari99 Right. The last sentence is what's novel I think. The model gets better at out-of-sample data because it is appropriately penalized while being trained with very large number of parameters. Apparently, they figured it out using random matrix theory, so the heavy lifting lies there I guess.
@@brhnkh the reward function is always the most difficult part of ML. It took me a while to figure out how to write mine for my architecture. Simply using max profit is not enough (a model could learn to buy and hold forever for instance) And neither is accuracy. (high accuracy != profitability). You have to reward it and penalize it in a similar way that you would a human based on metrics its able to achieve on a trade by trade and portfolio basis.
@@brhnkh I thought a definition of overfitting was when validation error starts to increase rapidly, after reaching its minimum during training, whilst training error continues to decrease. It is therefore not clear to me, why you would want a model to overfit at all, finance or not. I'm only 4 minutes into the video, perhaps he will explain it.
Quantitative Finance is really exhausting; many different authors, books, and articles contradict one another. Complexity is usually viewed with disapproval by the industry.
Stochastic discount factors?
so dump your predictors in a net with a huge hidden layer and feed the final layer through a ridge regression?
My eyes wide open - if creed knew ML
This is hardly something new. SVM’s have been designed on the same principle.
You're mistaken; this is completely different from SVMs. Therefore, an SVM would yield different results. Read their paper.
You misunderstood my point. I know it’s different from SVM I said the principles are the same. You need to read more especially read the works by Vapnik on the fundamentals of machine learning. SVM like DNN achieves zero training error but achieves good out of sample performance this due to controlling the learning capacity of the machine via regularization
@@traveleurope5756@traveleurope5756 Well, that is completely different from your initial statement and actually insightful. So you are saying that the same is achieved here through ridgeless regression?