Man, you did a great job digging into the code details and also put in your own thoughts. I usually dont leave a comment, but your video is way way better than those ones that claims to teach something complicated in 10 or 15 mins with random visualization. One suggestion, maybe you could do a video on the code analysis of metaAI omnivore and omniMAE, they are extensions of swintransformer but support both video and image.
I'm so glad that you liked the video. Thanks for suggesting these two papers. I'll definitely look into those. The thing is I'm recording two videos on an entirely different topic. It might take me a while before getting back to vision transformers.
you said the 200 epochs test u ran is not a proper wxperiment to judge the quality of this transformer architecture. So other than increasing the c vaue back to 96, what other things should I look into to experiment and get the best performance out of this architecture
The settings I used in the video were simple just to have a taste of this transformer. In my opinion, a proper experiment would be replicating the results in the paper on ImageNet-1K dataset (the ones in table 1). This way we can judge the model, and then look for improvement.
@@mashaan14 im sorry. which table? (Also, big thanks your videos and replies have been a big help. however, any chance I can ask somewhere more convenient than yt comments?)
Table 1 on page 6 of swin transformer paper. You can text me on twitter or linkedin, whichever is convenient to you. twitter.com/mashaan_14 linkedin.com/in/mashaan
Usually, an image classification model is used at the beginning of object detection pipeline, and it’s called backbone. Most object detection pipelines use ResNet as backbone. I assume that you want to replace ResNet with Swin, just like what they did in the paper (section 4.2). If that’s the case, your best option is to use MMDetection library. They already included Swin as a backbone on their github: github.com/open-mmlab/mmdetection/blob/cfd5d3a985b0249de009b67d04f37263e11cdf3d/mmdet/models/backbones/swin.py
Man, you did a great job digging into the code details and also put in your own thoughts. I usually dont leave a comment, but your video is way way better than those ones that claims to teach something complicated in 10 or 15 mins with random visualization. One suggestion, maybe you could do a video on the code analysis of metaAI omnivore and omniMAE, they are extensions of swintransformer but support both video and image.
I'm so glad that you liked the video. Thanks for suggesting these two papers. I'll definitely look into those.
The thing is I'm recording two videos on an entirely different topic. It might take me a while before getting back to vision transformers.
you said the 200 epochs test u ran is not a proper wxperiment to judge the quality of this transformer architecture. So other than increasing the c vaue back to 96, what other things should I look into to experiment and get the best performance out of this architecture
The settings I used in the video were simple just to have a taste of this transformer. In my opinion, a proper experiment would be replicating the results in the paper on ImageNet-1K dataset (the ones in table 1). This way we can judge the model, and then look for improvement.
@@mashaan14 im sorry. which table?
(Also, big thanks your videos and replies have been a big help. however, any chance I can ask somewhere more convenient than yt comments?)
Table 1 on page 6 of swin transformer paper.
You can text me on twitter or linkedin, whichever is convenient to you.
twitter.com/mashaan_14
linkedin.com/in/mashaan
@@mashaan14 Okay thanks
is the notebook where you test the model with C = 48 and 200 epochs available somewhere, I would really like to check it out
here you go:
github.com/mashaan14/TH-cam-channel/blob/main/notebooks/2024_08_19_swin_transformer.ipynb
@@mashaan14 thankssss
Hi sir
Im a student who is studying on it
i would like to use swin transformer on object detection from my project
how can i accomplish
thank you sir
Usually, an image classification model is used at the beginning of object detection pipeline, and it’s called backbone. Most object detection pipelines use ResNet as backbone.
I assume that you want to replace ResNet with Swin, just like what they did in the paper (section 4.2). If that’s the case, your best option is to use MMDetection library. They already included Swin as a backbone on their github:
github.com/open-mmlab/mmdetection/blob/cfd5d3a985b0249de009b67d04f37263e11cdf3d/mmdet/models/backbones/swin.py