2024 updated single-cell guide - Part 1: RNA preprocessing and quality control

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 มิ.ย. 2024
  • This is a comprehensive tutorial on the most up-to-date recommendations for single-cell sequencing. This is part 1 of a multi-part series. Here I download a dataset, remove background RNA, preform quality control, and remove low quality cells.
    Part 2 will cover dimension reduction and cell annotation. We will eventually get to in-depth analysis and scATAC analysis.
    Notebook:
    github.com/mousepixels/sanbom...
    Paper/dataset:
    www.cell.com/cancer-cell/full...
    Reference:
    www.sc-best-practices.org/pre...
    0:00 Intro
    0:27 Setup
    12:08 Cellbender
    18:20 QC
    28:05 preprocessing
    39:42 Conclusions
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 68

  • @007ZK
    @007ZK 2 หลายเดือนก่อน +7

    Amazing series idea. I hope they keep coming.

    • @sanbomics
      @sanbomics  2 หลายเดือนก่อน +3

      Hope is next week!

  • @lly6115
    @lly6115 2 หลายเดือนก่อน +6

    Good to see you back😊 and thank you for your update

    • @sanbomics
      @sanbomics  2 หลายเดือนก่อน

      Yeah sorry I have been busy! Shouldn't be as long between the next few videos.

  • @0996Winglet-mq4on
    @0996Winglet-mq4on 2 หลายเดือนก่อน +5

    really appreciate your videos🎉❤cannot wait to see spatial omics tutorial in the future😊

    • @sanbomics
      @sanbomics  2 หลายเดือนก่อน

      Right now I am eagerly waiting some interesting datasets with newer more high res technology than visium

  • @ykoy1577
    @ykoy1577 2 หลายเดือนก่อน +3

    I was waiting for your video. your video is so helpful for beginner like me. Thank you so much for sharing your knowledge and experience

  • @MrJordi94
    @MrJordi94 2 หลายเดือนก่อน +1

    You trully are an inspiration for rna-seq! Love your videos and your communication skills. Hope to see the rest of the 2024 tutotial soon :D

    • @sanbomics
      @sanbomics  หลายเดือนก่อน

      Thank you

  • @caspase888
    @caspase888 2 หลายเดือนก่อน +1

    I look forward to your videos. Your grasp on the subject and the ability to teach are amazing. Thanks a lot 👍🏻

    • @sanbomics
      @sanbomics  2 หลายเดือนก่อน

      Thank you! :)

  • @supakornpongpakdee1544
    @supakornpongpakdee1544 2 หลายเดือนก่อน +1

    Thank you very much for creating this tutorial! Looking forward to the next lessons!😊❤

  • @piroDYMSUS
    @piroDYMSUS 2 หลายเดือนก่อน +3

    Amazing work, hope we will see second part soon

    • @sanbomics
      @sanbomics  2 หลายเดือนก่อน +1

      Trying to release in the next week or two!

  • @yaseminsucu416
    @yaseminsucu416 2 หลายเดือนก่อน

    You rock! Thank you for doing this, looking forward to following this series!

  • @babyfriedrice4878
    @babyfriedrice4878 2 หลายเดือนก่อน +5

    i love sanbomics so much!!!!!!!!!!!!!!!!!!!

    • @sanbomics
      @sanbomics  2 หลายเดือนก่อน +1

      I love you too!

  • @DuqueVJ
    @DuqueVJ 2 หลายเดือนก่อน

    Amazing! Thanks very much for the tutorial, I'm learning a lot!

  • @jonathanback5731
    @jonathanback5731 2 หลายเดือนก่อน

    Your work is fantastic, great content!

  • @dardas15
    @dardas15 หลายเดือนก่อน

    this is fantastic and really helps people with limited bioinformatics background to independently analyze data-thanks so much for making these videos, ive been using them with python ever since you shared a few years ago!

  • @jackmineeechen4380
    @jackmineeechen4380 2 หลายเดือนก่อน

    I started with the video camparing different intergration method. That one really helped me! I eventually choose scanorama for my dataset, which worked out. Looking forward to this series! I appreciate your videoes!

  • @avp300
    @avp300 2 หลายเดือนก่อน

    this is brilliant! can't wait for part two!! Ridge plot look awesome! thank you Mark! :-)

    • @sanbomics
      @sanbomics  หลายเดือนก่อน +1

      Tomorrow hopefully!

  • @laloulymounia9266
    @laloulymounia9266 2 หลายเดือนก่อน

    Thx for the update !

  • @alexeyryzhenkov7579
    @alexeyryzhenkov7579 หลายเดือนก่อน

    Thank you for your work!

    • @sanbomics
      @sanbomics  หลายเดือนก่อน

      Thank you so much!!! Really appreciate it! :)

  • @moonmoun2983
    @moonmoun2983 หลายเดือนก่อน

    Waiting impatiently for the next part

    • @sanbomics
      @sanbomics  หลายเดือนก่อน

      Wait no further! :)

  • @jianhuacao7180
    @jianhuacao7180 2 หลายเดือนก่อน

    welcome back, bro. Your channel is better than before.

    • @sanbomics
      @sanbomics  หลายเดือนก่อน

      Thanks! I am trying to continually improve the quality and make videos people are actually interested in.

  • @brunovinagre427
    @brunovinagre427 2 หลายเดือนก่อน

    gratefull Mark!!

  • @user-ne7vm7fb3y
    @user-ne7vm7fb3y 2 หลายเดือนก่อน

    You were great.

  • @taoufikbensellak9274
    @taoufikbensellak9274 2 หลายเดือนก่อน

    I just started your sc guide and I really enjoy it. Just for some clarifications about the tools, I use mamba (conda) with python 3.8 and a lower version of pandas (

    • @sanbomics
      @sanbomics  22 วันที่ผ่านมา

      I'll be doing DE using a different approach this time which should give people fewer issues. Diffxpy can be a struggle so I don't really use it anymore

  • @MinnnWang-uv8bn
    @MinnnWang-uv8bn 2 หลายเดือนก่อน

    🎉🎉🎉thanks!

  • @gerolduntergasser4000
    @gerolduntergasser4000 2 หลายเดือนก่อน

    cool
    good job😁

  • @frutitadelosmares
    @frutitadelosmares วันที่ผ่านมา

    Hi! Thanks so much for such a great tutorial!
    Have a naïve question of someone who just started in this world: When raw data is not available, for example, you can only download normalised filtered values, do you skip the pre-processing step? Is it correct to pre-process normalised values, let's say tmm?
    Again, thanks so much for all the videos!

  • @islemgammoudi842
    @islemgammoudi842 2 หลายเดือนก่อน

    Thanks for the Videos. Currently, I'm embarking on the journey of analyzing single-cell RNA sequencing (scRNA-seq) data combined with CITE-seq data. However, I'm facing challenges related to duplicate discrimination and assigning sub-samples via hashtags.
    Given your expertise in this area, I was hoping you could provide some guidance and advice on how to navigate these challenges effectively.

  • @kristifourie8427
    @kristifourie8427 2 หลายเดือนก่อน

    best page ever

    • @sanbomics
      @sanbomics  หลายเดือนก่อน

      Thank you :)

  • @moonmoun2983
    @moonmoun2983 หลายเดือนก่อน

    I would like to thank you immensely because you’re one of the few bioinfo channels I can follow along, I have a question regarding a result I obtained from a following the previous full scRnA seq walkthrough you posted a year ago. I tried applying the code to a before and after chemotherapy treatment. Everything worked perfectly until i got to the deg analysis part with heat maps, With 25 top upregulated and downregulated genes and the filtering codes it didn’t yield more than 12 degs, so I had to reduce the filtering and kept genes with significant fold change above 0.05 . And I ended up with more differentially expressed genes, however in both cases my heat map was devoid of pattern, both the condition and control looked mostly downregulated. Should I conclude that there is no deg or expression signatures in both cancer sample before and aftertreatment? Because the original paper i took my data from didn’t do a deg analysis for the whole dataset but selected 4 patients out of 12 to create a deg heatmap with less than 10 genes. thank you, I’d highly appreciate your insight on my results

    • @sanbomics
      @sanbomics  หลายเดือนก่อน

      Its really hard to say without knowing more and actually getting a feel for the data. You can try a pseduobulk approach and see if you have and degs. I have a video on that, but will also be covering it soon in the new tutorial series.

  • @fsh9134
    @fsh9134 20 วันที่ผ่านมา

    Thanks for making very useful videos. I was wondering if you would like to make a video related to single cell analysis using Julius AI a data analysis AI.

  • @mehdiraouine2979
    @mehdiraouine2979 หลายเดือนก่อน

    amazing work as always ! on a side note, if I were to download a fastq data from GEO with no specification of whether the adapters were removed or not in the paper, how should I check if they were removed on python.

    • @sanbomics
      @sanbomics  หลายเดือนก่อน

      I wouldn't use python to do it only because there are several command line tools that are much faster that can do the same thing. Like cutadapt

  • @caspase888
    @caspase888 10 วันที่ผ่านมา

    Your videos are amazing. Thanks a lot.
    Could I use 3050 with 64 GB RAM for this kind of analysis?
    Thanks a lot.

  • @555gong9
    @555gong9 2 หลายเดือนก่อน

    Thank you for such a great video. Which is better for removing doublets, doubletdetection or the previous SCVI method?

    • @sanbomics
      @sanbomics  2 หลายเดือนก่อน

      I haven't done or seen a comparison between the two. The best would probably be to run both and see how they overlap. All i can say is that doubletdetection is easier and faster

    • @555gong9
      @555gong9 2 หลายเดือนก่อน

      Thank you for your advice, I will try it next, thank you very much, my superhero.

  • @goddyhong
    @goddyhong 2 หลายเดือนก่อน

    thx for sharing! if i use a filtered matrix for analysis, do i still need to remove the background RNA? since i dont have a 4090🤣

    • @sanbomics
      @sanbomics  หลายเดือนก่อน +1

      If you have a filtered matrix you can't remove background RNA. But if its just a time thing, you can use your CPUs with SoupX. I have another video on that. If you only have filtered counts, you are stuck with what you have!

  • @mehdiraouine2979
    @mehdiraouine2979 หลายเดือนก่อน

    Another question: if you were to choose between SCVi model for detecting doublets and this clf doubletdetection method, which one is more straightforward? I feel like this method needs some tinkering around depending on the specific dataset

    • @sanbomics
      @sanbomics  หลายเดือนก่อน

      The best method would be to use multiple methods. They will all give you slightly different results but hopefully have significant overlap. The reason I used doubletdetection here is because it is fast/simple and I already have multiple video tutorials on SOLO (scVI). It's hard to say which is more accurate. Changing parameters in scvi/SOLO will likely change the results a lot too just like what happened here.

  • @CaveCrack
    @CaveCrack 2 หลายเดือนก่อน

    Thanks for the great video and series. I have a question at around 36:40 on how to interpret the graph. If the experiment had loaded say 14000 cells it appears that around 8000 would be recovered which I assume we would interpret as the number called by cellranger... For 14000 cells loaded the multiplet rate appears to be 6%, 6% of 14000 being 840 expected multiplets. However, all the blue recovery dots are aligned around 4.5%. 4.5% of 8000 would be only 360 expected multiplets. The document from which the graph is extracted says "Generally an increased number of cells per sample will increase the doublet rate". I've not been able to find clarification. Thank you

    • @CaveCrack
      @CaveCrack 2 หลายเดือนก่อน

      Also, I am wondering if your low number of detected doublets at 1e-16 was due to the previous QC step where you exclude cells with the highest logp_total_counts and log1p_n_genes_by_counts, as these could filter a lot of doublets.

    • @sanbomics
      @sanbomics  หลายเดือนก่อน

      I think in this case just ignore the blue line. The more cells you load the higher multiplet rate and more total multiplets you will have

    • @sanbomics
      @sanbomics  หลายเดือนก่อน

      Exactly, it's hard to say exactly what percent the multiplets are because of the first step. I think I mention it in the video briefly... or at least i thought it

  • @abellopez8017
    @abellopez8017 2 หลายเดือนก่อน +1

    Hello! Thanks for the Video, I will begin my PhD in Bioinformatics in August, what computer do you have?

    • @sanbomics
      @sanbomics  2 หลายเดือนก่อน

      Well.. at home I have a 32 vCPU, 128 gb ram, rtx 4090. At werk I have a 64 cpu, 256 gb RAM, rtx 4090. Sometimes I have to use AWS when I need more than that. Depending on what you plan to do it can vary a lot.

  • @AP-vo7gp
    @AP-vo7gp 2 หลายเดือนก่อน

    Sir, I have count matrix and want generate annotation matrix out of it then do the batch correction and then DGA plz help via process as i am not getting suitable results.

    • @sanbomics
      @sanbomics  22 วันที่ผ่านมา +1

      Hi it is hard for me to help without knowing more specifics and what the issue you are having is

    • @AP-vo7gp
      @AP-vo7gp 21 วันที่ผ่านมา

      @@sanbomics thanks alot sir I was able do it :)

  • @pinchos90
    @pinchos90 หลายเดือนก่อน

    are you're still going to develop workflows for R or you're sticking with python?

    • @sanbomics
      @sanbomics  หลายเดือนก่อน

      I prefer python, but even this tutorial series will have some R in it because it is unavoidable. So I will have more R videos in the future

  • @ghujka
    @ghujka หลายเดือนก่อน

    Have a beer on me bro🍺

    • @sanbomics
      @sanbomics  หลายเดือนก่อน

      Thank you!!! I can do that ;)

  • @charlieintampa6769
    @charlieintampa6769 2 หลายเดือนก่อน

    F%(k. Seems super useful but you could have been speaking any random language and I would have understood about the same.