2024 updated single-cell guide - Part 1: RNA preprocessing and quality control
ฝัง
- เผยแพร่เมื่อ 21 มิ.ย. 2024
- This is a comprehensive tutorial on the most up-to-date recommendations for single-cell sequencing. This is part 1 of a multi-part series. Here I download a dataset, remove background RNA, preform quality control, and remove low quality cells.
Part 2 will cover dimension reduction and cell annotation. We will eventually get to in-depth analysis and scATAC analysis.
Notebook:
github.com/mousepixels/sanbom...
Paper/dataset:
www.cell.com/cancer-cell/full...
Reference:
www.sc-best-practices.org/pre...
0:00 Intro
0:27 Setup
12:08 Cellbender
18:20 QC
28:05 preprocessing
39:42 Conclusions - วิทยาศาสตร์และเทคโนโลยี
Amazing series idea. I hope they keep coming.
Hope is next week!
Good to see you back😊 and thank you for your update
Yeah sorry I have been busy! Shouldn't be as long between the next few videos.
really appreciate your videos🎉❤cannot wait to see spatial omics tutorial in the future😊
Right now I am eagerly waiting some interesting datasets with newer more high res technology than visium
I was waiting for your video. your video is so helpful for beginner like me. Thank you so much for sharing your knowledge and experience
You trully are an inspiration for rna-seq! Love your videos and your communication skills. Hope to see the rest of the 2024 tutotial soon :D
Thank you
I look forward to your videos. Your grasp on the subject and the ability to teach are amazing. Thanks a lot 👍🏻
Thank you! :)
Thank you very much for creating this tutorial! Looking forward to the next lessons!😊❤
Amazing work, hope we will see second part soon
Trying to release in the next week or two!
You rock! Thank you for doing this, looking forward to following this series!
i love sanbomics so much!!!!!!!!!!!!!!!!!!!
I love you too!
Amazing! Thanks very much for the tutorial, I'm learning a lot!
Your work is fantastic, great content!
this is fantastic and really helps people with limited bioinformatics background to independently analyze data-thanks so much for making these videos, ive been using them with python ever since you shared a few years ago!
I started with the video camparing different intergration method. That one really helped me! I eventually choose scanorama for my dataset, which worked out. Looking forward to this series! I appreciate your videoes!
this is brilliant! can't wait for part two!! Ridge plot look awesome! thank you Mark! :-)
Tomorrow hopefully!
Thx for the update !
Thank you for your work!
Thank you so much!!! Really appreciate it! :)
Waiting impatiently for the next part
Wait no further! :)
welcome back, bro. Your channel is better than before.
Thanks! I am trying to continually improve the quality and make videos people are actually interested in.
gratefull Mark!!
You were great.
I just started your sc guide and I really enjoy it. Just for some clarifications about the tools, I use mamba (conda) with python 3.8 and a lower version of pandas (
I'll be doing DE using a different approach this time which should give people fewer issues. Diffxpy can be a struggle so I don't really use it anymore
🎉🎉🎉thanks!
cool
good job😁
Hi! Thanks so much for such a great tutorial!
Have a naïve question of someone who just started in this world: When raw data is not available, for example, you can only download normalised filtered values, do you skip the pre-processing step? Is it correct to pre-process normalised values, let's say tmm?
Again, thanks so much for all the videos!
Thanks for the Videos. Currently, I'm embarking on the journey of analyzing single-cell RNA sequencing (scRNA-seq) data combined with CITE-seq data. However, I'm facing challenges related to duplicate discrimination and assigning sub-samples via hashtags.
Given your expertise in this area, I was hoping you could provide some guidance and advice on how to navigate these challenges effectively.
best page ever
Thank you :)
I would like to thank you immensely because you’re one of the few bioinfo channels I can follow along, I have a question regarding a result I obtained from a following the previous full scRnA seq walkthrough you posted a year ago. I tried applying the code to a before and after chemotherapy treatment. Everything worked perfectly until i got to the deg analysis part with heat maps, With 25 top upregulated and downregulated genes and the filtering codes it didn’t yield more than 12 degs, so I had to reduce the filtering and kept genes with significant fold change above 0.05 . And I ended up with more differentially expressed genes, however in both cases my heat map was devoid of pattern, both the condition and control looked mostly downregulated. Should I conclude that there is no deg or expression signatures in both cancer sample before and aftertreatment? Because the original paper i took my data from didn’t do a deg analysis for the whole dataset but selected 4 patients out of 12 to create a deg heatmap with less than 10 genes. thank you, I’d highly appreciate your insight on my results
Its really hard to say without knowing more and actually getting a feel for the data. You can try a pseduobulk approach and see if you have and degs. I have a video on that, but will also be covering it soon in the new tutorial series.
Thanks for making very useful videos. I was wondering if you would like to make a video related to single cell analysis using Julius AI a data analysis AI.
amazing work as always ! on a side note, if I were to download a fastq data from GEO with no specification of whether the adapters were removed or not in the paper, how should I check if they were removed on python.
I wouldn't use python to do it only because there are several command line tools that are much faster that can do the same thing. Like cutadapt
Your videos are amazing. Thanks a lot.
Could I use 3050 with 64 GB RAM for this kind of analysis?
Thanks a lot.
Thank you for such a great video. Which is better for removing doublets, doubletdetection or the previous SCVI method?
I haven't done or seen a comparison between the two. The best would probably be to run both and see how they overlap. All i can say is that doubletdetection is easier and faster
Thank you for your advice, I will try it next, thank you very much, my superhero.
thx for sharing! if i use a filtered matrix for analysis, do i still need to remove the background RNA? since i dont have a 4090🤣
If you have a filtered matrix you can't remove background RNA. But if its just a time thing, you can use your CPUs with SoupX. I have another video on that. If you only have filtered counts, you are stuck with what you have!
Another question: if you were to choose between SCVi model for detecting doublets and this clf doubletdetection method, which one is more straightforward? I feel like this method needs some tinkering around depending on the specific dataset
The best method would be to use multiple methods. They will all give you slightly different results but hopefully have significant overlap. The reason I used doubletdetection here is because it is fast/simple and I already have multiple video tutorials on SOLO (scVI). It's hard to say which is more accurate. Changing parameters in scvi/SOLO will likely change the results a lot too just like what happened here.
Thanks for the great video and series. I have a question at around 36:40 on how to interpret the graph. If the experiment had loaded say 14000 cells it appears that around 8000 would be recovered which I assume we would interpret as the number called by cellranger... For 14000 cells loaded the multiplet rate appears to be 6%, 6% of 14000 being 840 expected multiplets. However, all the blue recovery dots are aligned around 4.5%. 4.5% of 8000 would be only 360 expected multiplets. The document from which the graph is extracted says "Generally an increased number of cells per sample will increase the doublet rate". I've not been able to find clarification. Thank you
Also, I am wondering if your low number of detected doublets at 1e-16 was due to the previous QC step where you exclude cells with the highest logp_total_counts and log1p_n_genes_by_counts, as these could filter a lot of doublets.
I think in this case just ignore the blue line. The more cells you load the higher multiplet rate and more total multiplets you will have
Exactly, it's hard to say exactly what percent the multiplets are because of the first step. I think I mention it in the video briefly... or at least i thought it
Hello! Thanks for the Video, I will begin my PhD in Bioinformatics in August, what computer do you have?
Well.. at home I have a 32 vCPU, 128 gb ram, rtx 4090. At werk I have a 64 cpu, 256 gb RAM, rtx 4090. Sometimes I have to use AWS when I need more than that. Depending on what you plan to do it can vary a lot.
Sir, I have count matrix and want generate annotation matrix out of it then do the batch correction and then DGA plz help via process as i am not getting suitable results.
Hi it is hard for me to help without knowing more specifics and what the issue you are having is
@@sanbomics thanks alot sir I was able do it :)
are you're still going to develop workflows for R or you're sticking with python?
I prefer python, but even this tutorial series will have some R in it because it is unavoidable. So I will have more R videos in the future
Have a beer on me bro🍺
Thank you!!! I can do that ;)
F%(k. Seems super useful but you could have been speaking any random language and I would have understood about the same.