Swin Transformer Code

แชร์
ฝัง
  • เผยแพร่เมื่อ 7 พ.ย. 2024

ความคิดเห็น • 12

  • @honglu679
    @honglu679 หลายเดือนก่อน

    Man, you did a great job digging into the code details and also put in your own thoughts. I usually dont leave a comment, but your video is way way better than those ones that claims to teach something complicated in 10 or 15 mins with random visualization. One suggestion, maybe you could do a video on the code analysis of metaAI omnivore and omniMAE, they are extensions of swintransformer but support both video and image.

    • @mashaan14
      @mashaan14  หลายเดือนก่อน

      I'm so glad that you liked the video. Thanks for suggesting these two papers. I'll definitely look into those.
      The thing is I'm recording two videos on an entirely different topic. It might take me a while before getting back to vision transformers.

  • @pradyumagarwal3978
    @pradyumagarwal3978 หลายเดือนก่อน

    you said the 200 epochs test u ran is not a proper wxperiment to judge the quality of this transformer architecture. So other than increasing the c vaue back to 96, what other things should I look into to experiment and get the best performance out of this architecture

    • @mashaan14
      @mashaan14  หลายเดือนก่อน

      The settings I used in the video were simple just to have a taste of this transformer. In my opinion, a proper experiment would be replicating the results in the paper on ImageNet-1K dataset (the ones in table 1). This way we can judge the model, and then look for improvement.

    • @pradyumagarwal3978
      @pradyumagarwal3978 หลายเดือนก่อน

      @@mashaan14 im sorry. which table?
      (Also, big thanks your videos and replies have been a big help. however, any chance I can ask somewhere more convenient than yt comments?)

    • @mashaan14
      @mashaan14  หลายเดือนก่อน

      Table 1 on page 6 of swin transformer paper.
      You can text me on twitter or linkedin, whichever is convenient to you.
      twitter.com/mashaan_14
      linkedin.com/in/mashaan

    • @pradyumagarwal3978
      @pradyumagarwal3978 หลายเดือนก่อน

      @@mashaan14 Okay thanks

  • @pradyumagarwal3978
    @pradyumagarwal3978 2 หลายเดือนก่อน

    is the notebook where you test the model with C = 48 and 200 epochs available somewhere, I would really like to check it out

    • @mashaan14
      @mashaan14  2 หลายเดือนก่อน +1

      here you go:
      github.com/mashaan14/TH-cam-channel/blob/main/notebooks/2024_08_19_swin_transformer.ipynb

    • @pradyumagarwal3978
      @pradyumagarwal3978 หลายเดือนก่อน

      @@mashaan14 thankssss

  • @0兒-y4c
    @0兒-y4c หลายเดือนก่อน

    Hi sir
    Im a student who is studying on it
    i would like to use swin transformer on object detection from my project
    how can i accomplish
    thank you sir

    • @mashaan14
      @mashaan14  หลายเดือนก่อน

      Usually, an image classification model is used at the beginning of object detection pipeline, and it’s called backbone. Most object detection pipelines use ResNet as backbone.
      I assume that you want to replace ResNet with Swin, just like what they did in the paper (section 4.2). If that’s the case, your best option is to use MMDetection library. They already included Swin as a backbone on their github:
      github.com/open-mmlab/mmdetection/blob/cfd5d3a985b0249de009b67d04f37263e11cdf3d/mmdet/models/backbones/swin.py