This Machine Learning Model outperformed the S&P500 [FULL GUIDE]

แชร์
ฝัง
  • เผยแพร่เมื่อ 14 ธ.ค. 2024

ความคิดเห็น • 90

  • @Algovibes
    @Algovibes  ปีที่แล้ว +5

    Hi all, please read:
    There are a lot of interesting and good thoughts shared in the comments until now but also some wrong claims.
    The correct criticism observed by some of you is a simplifying assumption I made: I am taking the return as my strategy return when I get a prediction.
    This return is based on the previous day Close and todays Close. If I take this return it is assumed I would be able to buy on the Close on the previous day - which is not possible.
    The 100% correct way is to calculate an intraday return instead (Take the return between Open and Close on that day) - as I can only buy on the Open. I am usually doing exactly that in a ton of my previous videos but left it out here.
    If you do this, the strategy would be outperformed by the S&P500 with an overfitted (and thereby wrongly trained) model - however interestingly with a train test split the model would still outperform the S&P500.

    • @itzfastbreak8885
      @itzfastbreak8885 ปีที่แล้ว

      would it be fair to use a shifted(-1) return column to calculate the returns then? I mean you enter the trade when you get the signal and see the day after if the trade ends up being profitable or not.

    • @bryan-9742
      @bryan-9742 ปีที่แล้ว

      Potential update .

    • @motib4184
      @motib4184 ปีที่แล้ว

      Thanks a lot for your videos! the whole concept of lags is to create more data points to train the model? in this case volume can be also a data point?

  • @ageens
    @ageens ปีที่แล้ว +1

    hmm, it works! 😀 Somewhere I saw Tensorflow model trained, it just shifted current up/down to next day and got like 60% success rate => like "up" or "down" often come more than 1 in a row => win! Every day - up and every second - down => loss 😅
    Problem I see could be "overtrading" - little daily moves does not cover fees/expenses.

  • @bryan-9742
    @bryan-9742 ปีที่แล้ว +1

    This is a really important video. incorporating the ROC curve and some more relevant features (like if you used some technical indicator) and used the balance classification methodology that would very useful. THanks again. for the post. I pay for the subcription as a thank you for the videos.

    • @Algovibes
      @Algovibes  ปีที่แล้ว

      Thanks a lot Bryan for your support!

  • @sergecornushov3111
    @sergecornushov3111 ปีที่แล้ว +1

    An idea for the video: use fundamental data to buy great businesses at a fair/discounted price instead of trying to time the market.

  • @literap4748
    @literap4748 ปีที่แล้ว +1

    Keep up your great work!

    • @Algovibes
      @Algovibes  ปีที่แล้ว

      I will :-) Thank you!

  • @mr.nonederful3314
    @mr.nonederful3314 ปีที่แล้ว

    I think to get around the leakage, we can add df['target'] = df.ret.shift(-1), then df['direction'] = np.where(df.targets > 0, 1, 0). Performance of the model does appears to drop when this is done. That said it's still a good introduction into a basic ML prediction.

  • @Shreendg
    @Shreendg ปีที่แล้ว +4

    Using regression models to a time series data is actually problematic. Time series data tends to exhibit autocorrelation and heteroscedasticity. So, without taking that into account, the accuracy of the forecast tends to be worse.

    • @bryan-9742
      @bryan-9742 ปีที่แล้ว +3

      I take his videos like this as coding not mathematically sound. From a coding perspective these videos are incredible

    • @Algovibes
      @Algovibes  ปีที่แล้ว

      Thanks for sharing your thoughts!
      The probabilistic model would take autocorrelation into account to a certain degree. If you have a short term autocorrelation of returns, the model should predict a higher probability of an up move the longer or the more precise the series of lags. If you increase the lags to say 30 you actually see an insane performance - However this significantly drops with a train test split.
      The interesting thing indeed is: What is the relevant autocorrelated period?

    • @Shreendg
      @Shreendg ปีที่แล้ว

      @@Algovibes you can check the autocorrelation period by plotting the PACF function. Also, increasing the lags would make the problem of multicollinearity worse. I'm not sure if ARIMA or GARCH models are good for classification. They're good for forecasting. Maybe we could get the Boolean output by looking at the sign of forecast?

    • @oskarfransson3152
      @oskarfransson3152 ปีที่แล้ว

      Actually, the default setting in sklearn log.regression is you train using a l2 norm penalty, which solve for the assumptions made in the classical logistic regression. Auto-correlation is no problem in @algovibes example since that information is exactly what the model trades on. I might agree on the heteroeskadicity though, one could argue that the series is still non-stationary hence the out of sample results might be statistically flawed.

  • @xntumrfo9ivrnwf
    @xntumrfo9ivrnwf ปีที่แล้ว +1

    You have a good way of explaining things practically! I guess this basically performs like a trend following strategy - no surprise it did well in the post covid period when CTAs outperformed.

    • @Algovibes
      @Algovibes  ปีที่แล้ว

      Thanks mate, happy to read!

  • @georgezii1081
    @georgezii1081 6 หลายเดือนก่อน +1

    I personally prefer doing a direction * ret.shift(1) when calculating rets.

    • @Algovibes
      @Algovibes  6 หลายเดือนก่อน

      whatever gets the job done :-)

  • @SweetMusical
    @SweetMusical 7 หลายเดือนก่อน +1

    Thank you for sharing amazing knowledge video

    • @Algovibes
      @Algovibes  7 หลายเดือนก่อน

      Thanks for watching mate! Appreciate it :-)

  • @lerivas77
    @lerivas77 ปีที่แล้ว +2

    very nice work, an important question is that in the real time trade, how can i handle the repainted signal?, that because with some backtest it works perfectly but while the trade is in real time .. some signal dissapears and backtest is not similar as real trades...

    • @Shreendg
      @Shreendg ปีที่แล้ว

      Nor as fast

  • @MyLeon99
    @MyLeon99 ปีที่แล้ว +1

    Hi. For educational purposes how would one take a model like that and connect it to a trading plattform and trade by itself? Do you have some content on that next step?

    • @Algovibes
      @Algovibes  ปีที่แล้ว

      Hi, yea sure. I have a whole playlist on that. Be invited to check out the cryptobot playlist.

  • @navketan1965
    @navketan1965 3 หลายเดือนก่อน

    Sir, For grid type where you buy & sell equal volume at each grid level,what pip distance do you suggest for different currency pairs & cross currencies.Say for a pair daily ATR is 100 pips,then what should be grid distance--50 or 70 pips?Draw down has to be kept under control.And how about grid trading for indices--as indices are more mean reverting after all.Any robots?

  • @nandoribas
    @nandoribas ปีที่แล้ว +1

    Suggestion: Several channels that I talk about code/trading make the code available for download on a drive. This really helps those who are learning/studying.

    • @Algovibes
      @Algovibes  ปีที่แล้ว

      Hi mate, code is available for Tier-3 members. Be invited to check out that option for you!

  • @alic690
    @alic690 ปีที่แล้ว +1

    Good video. I'd like to see more on ML topics, especially in the non-2020 period (which was the overperform period)

    • @Algovibes
      @Algovibes  ปีที่แล้ว

      Thanks for the feedback Alex 🙂

  • @hannes6116
    @hannes6116 ปีที่แล้ว +1

    Hi, is it really necessary to split the data the way you do and mention because of time series.
    All input is independent of the time series, since each line contains the input and output. It does not use data from rows above or below.
    Sorry I am just confused and asking because I don't know it.
    But once more thanks for sharing, I like it a lot.

  • @prashantdani8527
    @prashantdani8527 ปีที่แล้ว +1

    Ok thanks nice video
    However pl advise how to predict for the next working day whether it will be up or down

    • @Algovibes
      @Algovibes  ปีที่แล้ว

      Thanks mate! Just provide the features (last 3-5 (or n) days) and predict the move.

  • @victorbartolo287
    @victorbartolo287 ปีที่แล้ว +3

    Hello and thank you for an excellent presentation. I am not sure about how this would work out in real life because you seem to be calculating the day's success using the prediction which is based on the Close price of the same day. Am I correct to say that (for example) today's prediction of direction wll be used on the next day's return?

    • @osylphx
      @osylphx ปีที่แล้ว

      I realized this too. The prediction is only good if you can use it to enter a position; but if the prediction is based on the close of the day, you can't enter that day, since the market is already closed. This video needs an update, whereby the prediction is used to enter the position on the NEXT day, and calculate the cumsum based on the gains/losses made there. Also, when a prediction says the market will go down, you could enter a short position. I do not believe you would want to build a live trading strategy using machine learning on such few features; you'll want to take into consideration other indicators' movements at least.

    • @Algovibes
      @Algovibes  ปีที่แล้ว

      @@osylphx yep, that’s a simplifying assumption. Please read the pinned 📌 comment. I have adressed it there.
      Cheers and thanks a lot for watching!

    • @victorbartolo287
      @victorbartolo287 ปีที่แล้ว

      I think you are correct because the independent variables are 'historical' data so the prediction generated is 'logically' valid. However the reference (P/L) is the change between the close of today and the close of yesterday. So maybe the close price of today should be added as a feauture. Perhaps the prediction can be used to trade on the next day's open . Another user mentioned there may be too many trades so cautiously I suggest to see what happens if the market is traded short when the prediction is negative rather than just close the long trade.

  • @volbuyer
    @volbuyer ปีที่แล้ว +2

    can you make a quick video on how would we implement it for crypto and how to deploy it so it can work ?

    • @Algovibes
      @Algovibes  ปีที่แล้ว +1

      I would but for some reason the crypto videos are not getting any traction any more :-(

  • @jerrywang3225
    @jerrywang3225 ปีที่แล้ว +1

    it's really hard to predict the market with logistic regression. I have tried different indicators, previous day returns, the best prediction accuracy I can come up with is less than 60%. Anyway, the video is super helpful as usual. Thanks.

    • @Algovibes
      @Algovibes  ปีที่แล้ว

      It is indeed! Thanks mate :-) Appreciate your comment!

  • @rraul
    @rraul ปีที่แล้ว +1

    I loved it! Thanks and keep motivated❤

    • @Algovibes
      @Algovibes  ปีที่แล้ว

      Thanks buddy :-)

  • @absimaldata
    @absimaldata ปีที่แล้ว +2

    Do you think that logistic regresision is enough to predict a market? I dont think so.
    I myself made a double LSTM model with drop off layers, that also is questionable and doesnt work as much.

    • @ageens
      @ageens ปีที่แล้ว

      you can add more factors/parameters/oscillators, but yes, you would hear about billionaires from multiplying their own savings.
      1st thing in machine learning class - everyone tries stocks, because easily available data and easy to spot mistakes, when someone claim supermodel created 😆🤣

    • @voltageinc9590
      @voltageinc9590 ปีที่แล้ว

      No, it is not enough. And also the LSTM Models are not accurate at all at most times. One important thing to point out is you should always try to predict the price-difference between today and the next day, and not the Closing Price of the next day... If you predict the Closing Price of the next day, the model seems to be very accurate but in fact is not at all. Machine Learning ( Deep Learning in this case ) is a very complex thing, but it is not as easy as shown in this video. Like AlgoVibes has said "Take it with a grain of salt". You will not make any constant profits with this model or profits at all (incl. comission etc.)...

    • @Algovibes
      @Algovibes  ปีที่แล้ว +2

      Good comment from Voltage in general, but I would argue it a bit "lighter". It really depends on what you are feeding into the model. The model itself is in most cases not the most relevant one. What you are considering to feed into it is relevant (and not trivial at all).

  • @Gingeey23
    @Gingeey23 ปีที่แล้ว +2

    Great video. I am currently doing something similar on the hourly timeframe using ANNs (with mixed results), but for these binary classification problems I'm steering towards random forests. do you aim to cover more ML topics in the future? cheers!

    • @Algovibes
      @Algovibes  ปีที่แล้ว +1

      I would but the videos are performing quite badly unfortunately. Thanks for watching and leaving a comment Charles. Appreciate it!

  • @bonadio60
    @bonadio60 ปีที่แล้ว +1

    When a strategy seems too good to be true, something is wrong. I think the error is that you are summing only the "correct" returns but when you are wrong you sum 0, you should sum the negative return because that day you lost money.

    • @Algovibes
      @Algovibes  ปีที่แล้ว +3

      First statement: Most probably, agreed. Second statement: No. I am only in the S&P when my model predicts 1. If on that day the S&P drops (wrong prediction), a negative return is taken into consideration.
      The problem with the strategy is rather the turnover which I am addressing at the end of the video. BTW thanks a lot for watching!

    • @bonadio60
      @bonadio60 ปีที่แล้ว +1

      @@Algovibes Ok that is correct, my bad

  • @tomsetberg4746
    @tomsetberg4746 7 หลายเดือนก่อน

    Maybe it does it without my seeing it but when you split the data between train and test, shouldn't you continue to train the model as each day passes? Maybe that is just too time consuming to put into the video.

  • @Elvis00026
    @Elvis00026 ปีที่แล้ว +1

    That's good, I like more the momentum strategy.

    • @Algovibes
      @Algovibes  ปีที่แล้ว +1

      There will surely be some more videos on Momentum strategies! Thanks for your feedback man :-)

    • @Elvis00026
      @Elvis00026 ปีที่แล้ว

      @@Algovibes A suggestion is an all time high stock buying strategy with a 10 ATR (average true range) stop. There is a paper somewhere that it was profitable on the 90's-00's.

  • @andrewmachief8637
    @andrewmachief8637 ปีที่แล้ว +1

    Another great video ❤
    Do you have any suggestions on where I should go to deploy a bot to my paper account?

    • @Algovibes
      @Algovibes  ปีที่แล้ว +1

      Thx a lot Andrew!
      I am using a simple form of a Linux VM on GCP currently and have some videos on that. Be kindly invited to check that out!

    • @andrewmachief8637
      @andrewmachief8637 ปีที่แล้ว +1

      @@Algovibes thanks a lot mate I’ll check your videos out now 🫡

  • @philipfiguerres6132
    @philipfiguerres6132 ปีที่แล้ว +1

    Nice video as always. Very good explanation on how you implemented the code. How does logistic regression compare to random forest and xgboost? Comparing or combining the different methods would be a nice video suggestion. much thanks,

    • @Algovibes
      @Algovibes  ปีที่แล้ว

      Thanks Philip!
      I have covered RF in a video roughly 1 year ago. But thanks a lot for the suggestion anyway.

  • @GeorgeN678
    @GeorgeN678 9 หลายเดือนก่อน +1

    Amazing work man ! Been a fan for a few years
    Ia there a way to contact you? I have a business inquiry 😊

    • @Algovibes
      @Algovibes  9 หลายเดือนก่อน

      Hey George, happy to have you on board and I appreciate the long term support very much! Sure, you can drop me a mail. It's in the about section of my channel. Looking forward to it!
      KR

    • @GeorgeN678
      @GeorgeN678 9 หลายเดือนก่อน

      @@Algovibes yoo KR, I could only find a website. I am already subscribed to the course. Is there a specific link where I can find your contact infos

  • @nitaiamir8703
    @nitaiamir8703 ปีที่แล้ว

    there is a dependency between x and y. You are feeding as one of the features the return of the last day , and try to predict the direction of the last day.

    • @Algovibes
      @Algovibes  ปีที่แล้ว

      Nope. I am feeding the return of the last n days and predicting t0. Pls watch the video all the way through. Thanks!

  • @sushka4122
    @sushka4122 ปีที่แล้ว

    But the wrong predictions are not considered in backtesting, right? For example if the model predicts a 1 with positive return and we get a 0 with negative return...

    • @TheEugevanz
      @TheEugevanz ปีที่แล้ว +2

      The negative returns are included in the calculation. I just have 1 question: is this an hourly chart?

    • @Algovibes
      @Algovibes  ปีที่แล้ว +1

      They are included. No this is daily data. Thanks for watching both!

  • @bahmanjafari1826
    @bahmanjafari1826 ปีที่แล้ว +1

    Very good

    • @Algovibes
      @Algovibes  ปีที่แล้ว

      Thank you my man!

  • @sweealamak628
    @sweealamak628 ปีที่แล้ว

    Back tests are sure to produce good results but in reality, the market has Reflexivity. I can't comment on this regression model but many have tried their models in real life and it was hedged away immediately. Once they stopped executing trades, the model predicted correctly. It's like a slap in the face really. 🤷🏻‍♂️

    • @jonathanrubinov3571
      @jonathanrubinov3571 ปีที่แล้ว

      Interesting, do you mind explaining more about this?

    • @sweealamak628
      @sweealamak628 ปีที่แล้ว

      @@jonathanrubinov3571 I'll try. Machine Learning is about making inferences from a dataset. If the dataset was static/fixed, a model would naturally work because it is acting upon that given set of variables. Take for example, flower recognition. The picture of the flower remains unchanged throughout so all the pixels a neural network can recognise will be read without interference, thus able to categorise the type of flower.
      Now looking at a stock price, actions from market participants constantly alter the characteristics of the stock price. If at 3.58pm, a stock price has Open-High-Low-Close(OHLC) of (3.1,3.9,2.8,3.8), then according to a model, it recognizes it is a strong buy signal, it places an order to buy @ market. The next minute at 3.59pm, the characteristics changes to (3.1,4.0,2.8,4.0) due to the executed trade which also triggered other traders to execute. In the end, the market closes at (3.1,4.4,2.8,4.4). Within a short 2 minutes, the stock price looks entirely different, instead of a normal looking Bullish Engulfing pattern, the pattern is over stretched. The very next day, the stock price gets sold off due to profit taking.
      What I describe above is very common and can be witnessed on a daily basis across the exchange. By executing trades, we are altering the market data, this is the Reflexivity I am referring to. So looking back at this video example, the model in theory is based on market close but in reality no trade can be guaranteed to execute at market close. And even if you try to execute at market close the very next day, prices don't follow previous day's prices, hence gaps in charts. This topic is hardly discussed and due to the lack ofnunderstanding of it but I have found some material on it, try the Quantinsti channel. They do touch on the subject now and then.

  • @waynehsu667
    @waynehsu667 ปีที่แล้ว

    Coz u only can buy in the next morning, U cant get the return calculated by the previous day close price.

    • @Algovibes
      @Algovibes  ปีที่แล้ว

      Yep, please read the pinned comment regarding this.

  • @Whatsitwhoseitdang
    @Whatsitwhoseitdang ปีที่แล้ว +1

    Comment to aide my algorithm

    • @Algovibes
      @Algovibes  ปีที่แล้ว

      Fingers crossed my man!

    • @Whatsitwhoseitdang
      @Whatsitwhoseitdang ปีที่แล้ว

      @Algovibes hey bro if you have the time to go over some tips, I'm still a beginner dev but bolstered with AI I'm able to skip alot of coding however I get alot of formatting errors if you're willing to do a discord convo or any other type of communication I'd be thankful

  • @r.navarro2513
    @r.navarro2513 ปีที่แล้ว

    You have forgotten to shift the ret when calculating the df['strat'] , then it imply a data leakage. 🧐

    • @Algovibes
      @Algovibes  ปีที่แล้ว

      No that would be incorrect, but I have addressed this in the pinned comment.

    • @r.navarro2513
      @r.navarro2513 ปีที่แล้ว

      @@Algovibes No, you have to shift the ret , because if you prediction is for today, you can get the return of today only if you bought yestarday and sell today. That is how is work for daily timeframe. The gap between yesterday Close and today Open can be huge. You did it many times right, I saw your videos. If you want to use intraday is OK, but then is a different approach. 🧐

    • @Algovibes
      @Algovibes  ปีที่แล้ว

      @@r.navarro2513 the prediction is based on shifted returns which is done in the Lagit function. The only thing which in fact has to be done is buying on the open for that predicted day (based on the shifted return). Shifting the return column itself is wrong.

  • @Master_of_Chess_Shorts
    @Master_of_Chess_Shorts ปีที่แล้ว

    I do not know what you are doing. You are training on X and then predicting on X... Linear regression is a bias based model. We are looking for a straight line that will fit the data. You should have held out data that the model has never seen. You are making predictions ondata that the model has seen because it was trained with it. This approach will not scale and will certainly be unable to perform equally on new data.

    • @Algovibes
      @Algovibes  ปีที่แล้ว +1

      I really don't mean this offensive but your comment is totally wrong. Do me a favor and watch the video again and watch it all the way through. A lot of your questions should be answered then (some of your claims are wrong as they do not apply to the content presented in the video). Thanks!

    • @markgamache6377
      @markgamache6377 ปีที่แล้ว

      Dude. He uses out of sample data.

  • @epicmonckey25001
    @epicmonckey25001 ปีที่แล้ว +1

    I was having issues when trying to fit the data to the model, I kept having the error of ' X value is infinite or NaN' I managed to fix this issue by essentially creating a new dataframe that dropped all the weird numbers. So, I went: df_new = df[np.isfinite(dp).all(1)] and then anywhere in his video that references df, i simply wrote df_new. Example; X = df_new[features] ALSO: **Make sure you import numpy for this to work**Hope this helps someone :)

    • @Algovibes
      @Algovibes  ปีที่แล้ว

      Thanks Alex for sharing your solution!

  • @popovichrush
    @popovichrush ปีที่แล้ว

    Hello sir pls I have a strategy that I want you to build a machine learning for me so that it will learn my strategy and after always alert me when it sees what I want...please help me but if you can't direct meto someone who can ok...pls