Thank you very much for this workflow. It's really helpful to understand the process and steps involved in the doubletfinder. I appreciate your efforts to educate the researcher through this activity.
This was very useful. It was different from our analyst strategy. Small request, instead of terminal bash, it would be helpful if you can route through save folders and files [setwd> ]. Thanks!
I think it's important that you explain why we assumer 7.5% doublet in our data. I know it has something to do with the number of droplets captured. But how do we determine the number of droplets captured (in order to infer the estimated % of real doublets)? Thank you!
Hello Khusbu, I'm working with a publicly available dataset GSE193688 where they have provided individual .h5 files for every samples. I'm trying to run the doublet finder program on it but as you have mentioned that it should not be preferable to run on merged samples then should I run it for each one separately? I have a total of 18 files for individual biopsy samples. Is there any faster method?
THANK YOU!!! This was a life saver. Quick question: I plan to use tabula muris senis, the mega mouse single-cell dataset and I was able to manuver through selecting age/organs I wanted to use. BUT I believe they have datasets per mouse and per organ... if that's the case, do I still have to run doubletFinder on each mouse or do you think I can use the selected age/organ, with the assumption that preparation process was similar enough that batch effect would likely be minimal..... I have 15 mice on Tabular muris I plan to use and additional 15 mice I have to filter 🥲
I suggest you first process your data with all 15 mice at once, as a merged object and visualize. Look for batch effects. If you don't find any, then you run doubletFinder on merged object. If you do find batch effects in your data then you will have to take the run doubletFinder for each individual mice route.
22:25 I want to clear lines with doublet characters from DF.classification column in metadata table. How can I clear it by typing command? Because to remove the doublet and integrate all samples.
Amazing job! Can you paste your codes of how you subset and recluster singlets after finishing DoubletFinder? Or can you confirm if you did exactly the same as the following steps? Thanks! singlet
Yes, I would run the steps you ran to recluster my cells after removing doublets from my data. Thank you for the suggestions for video topics, I have them in my pipeline :)
Thank you so much for this tutorial it's very informative. I was wondering if you knew how to find the expected number of doublets for icell8 sequencing data? Thank you in advance
Thank you for this video, but the question is whether the search and removal of doublet should be carried out before data merging and QC. In your previous video of data integration, you merged 7 samples. Does that mean that we need to clean the data 7 times before merge?Hope for your reply.
What I mean is when we need to integrate several datasets, before which step should we perform the detection of doublets?Befor merge datasets?If the detection of doublets should be done before merge() function, is it necessary to perform QC and pre process standard workflow for each dataset separately?
Yes, it is recommended to perform doublet removal and QC for each dataset individually before integrating datasets. It can however be run on merged data. The standard workflow steps just helps identify and remove clusters of cells with low UMI or high mitochondrial %. These low quality cells must be filtered out before running a doublet prediction algorithm and before integrating and moving ahead with further downstream analysis.
Hello, Thanks for the great tutorial! I have one question, maybe I missed it, but - why do you use the nsclc data when calculating the pK value (starting from line 47) rather than pbmc that you used in the steps before that? Thank you!
@anaarsenijevic3207, she used the pbmc seurat object only in line 47. Only the name of the list she created has the nsclc name, you can name it anything you want.
Thank you so much for this helpful video. I have a question. At the last step that we detect doublets and we remove them how we could go back to the first step to do integration? no sure how to transfer the needed assay to the data.
You shall use "integrated" assay (if used CCA method to integrate), and move forward with the steps just how you would process data in 'RNA' slot of Seurat object.
@@Bioinformagician When I do DoubletFinder the integration still needs to be done. I mean after subsetting doublets from every individual sample, what approach I need to take. Should I move forward with subsetted samples and integrate. Thanks
Thank you for tutorial. I run pK Identification code, and then pK=0.2. The number of doublets is the same, but the shape of the graph is different. I wonder if I can move on to the next step or if I need to fix this issue. Thank you!
10X user guides provide expected multiplet rate for different protocols. Here I have used the table on page 18 from the Chromium Next GEM Single Cell 3ʹ Reagent Kits v3.1 user guide (support.10xgenomics.com/single-cell-gene-expression/library-prep/doc/user-guide-chromium-single-cell-3-reagent-kits-user-guide-v31-chemistry) to get the doublet formation rate.
I don't understand why in a dataset of 15000 real cells, a pN of 0,25 would represent the integration of 5000 artificial doublets... If anyone can solve my question... Thank you!!!
Thanks for this workflow and shared the code. I have one issue when I run your code at the second last step. > DimPlot(pbmc.seurat.filtered, reduction = 'umap', group.by = "DF.classifications_0.25_0.21_691") Error in `[.data.frame`(data, , group) : undefined columns selected In addition: Warning message: The following requested variables were not found: DF.classifications_0.25_0.21_691 Could you please help to check it? Thanks.
@@rahmaqadeer9178 The problem is that ParamSweep cannot find your normalized RNA counts. Here’s how to fix it: Instead of using "NormalizedData(sobj, normalization.method = "LogNormalize", scale.factor = 10000)" Do the following: "sobj
Really great tutorial!
Thank you very much for this workflow. It's really helpful to understand the process and steps involved in the doubletfinder. I appreciate your efforts to educate the researcher through this activity.
You're awesome keep up the amazing work!
Thank you :)
I do really enjoy your channel 🤠 I am doing same analysis and it is very kind of you that you share your approach and code! Many thanks 👍
I am glad to hear my videos have been helpful! Thank you for your kind words :)
This was very useful. It was different from our analyst strategy. Small request, instead of terminal bash, it would be helpful if you can route through save folders and files [setwd> ]. Thanks!
Thank you for the suggestion, I am more comfortable in maneuvering through the folders via terminal. However, I shall try to do it via R next time :)
A really great video!!! Thank you very much !!!
Thank you for this video !!
I think it's important that you explain why we assumer 7.5% doublet in our data. I know it has something to do with the number of droplets captured. But how do we determine the number of droplets captured (in order to infer the estimated % of real doublets)? Thank you!
Thank you very much. Can you please do a tutorial on how to use DropletUtils library
you are amazing
thank you so much
Hello Khusbu,
when I run "> sweep.res.list
Hello Khusbu, I'm working with a publicly available dataset GSE193688 where they have provided individual .h5 files for every samples. I'm trying to run the doublet finder program on it but as you have mentioned that it should not be preferable to run on merged samples then should I run it for each one separately? I have a total of 18 files for individual biopsy samples. Is there any faster method?
THANK YOU!!! This was a life saver. Quick question: I plan to use tabula muris senis, the mega mouse single-cell dataset and I was able to manuver through selecting age/organs I wanted to use. BUT I believe they have datasets per mouse and per organ... if that's the case, do I still have to run doubletFinder on each mouse or do you think I can use the selected age/organ, with the assumption that preparation process was similar enough that batch effect would likely be minimal..... I have 15 mice on Tabular muris I plan to use and additional 15 mice I have to filter 🥲
I suggest you first process your data with all 15 mice at once, as a merged object and visualize. Look for batch effects. If you don't find any, then you run doubletFinder on merged object. If you do find batch effects in your data then you will have to take the run doubletFinder for each individual mice route.
so at the final, if I would like to filter out those doublets, and continue my rest analysis, what should I do to filter out those doublets?
Thank you for your tutorial,could you please tell me if the paper tell us how to mark doublets in the raw data?
Please we need application of NMF (non negative matrix factorization) in scRNA-seq for finding expression programs
I'll consider making a video on this soon :) Thanks for the suggestion.
22:25 I want to clear lines with doublet characters from DF.classification column in metadata table. How can I clear it by typing command?
Because to remove the doublet and integrate all samples.
Hi Thank you for very details tutorial!! May i know how I can get the cell identity from demuxlet data after I get all the singlet?thank you
Amazing job! Can you paste your codes of how you subset and recluster singlets after finishing DoubletFinder? Or can you confirm if you did exactly the same as the following steps? Thanks!
singlet
Yes, I would run the steps you ran to recluster my cells after removing doublets from my data.
Thank you for the suggestions for video topics, I have them in my pipeline :)
Thank you so much for this tutorial it's very informative. I was wondering if you knew how to find the expected number of doublets for icell8 sequencing data? Thank you in advance
Thank you for this video, but the question is whether the search and removal of doublet should be carried out before data merging and QC. In your previous video of data integration, you merged 7 samples. Does that mean that we need to clean the data 7 times before merge?Hope for your reply.
What I mean is when we need to integrate several datasets, before which step should we perform the detection of doublets?Befor merge datasets?If the detection of doublets should be done before merge() function, is it necessary to perform QC and pre process standard workflow for each dataset separately?
Yes, it is recommended to perform doublet removal and QC for each dataset individually before integrating datasets. It can however be run on merged data. The standard workflow steps just helps identify and remove clusters of cells with low UMI or high mitochondrial %. These low quality cells must be filtered out before running a doublet prediction algorithm and before integrating and moving ahead with further downstream analysis.
Can I still run DoubletFinder on 'SCTransform normalised' sample?
If yes, is it as simple as setting 'sct = TRUE' in 'sweep.res.list_pbmc
DoubletFinder can be used on Seurat object that has been SCTransform during pre-processing steps. And yes, it is as simple as setting sct = TRUE.
Hello, Thanks for the great tutorial! I have one question, maybe I missed it, but - why do you use the nsclc data when calculating the pK value (starting from line 47) rather than pbmc that you used in the steps before that? Thank you!
@anaarsenijevic3207, she used the pbmc seurat object only in line 47. Only the name of the list she created has the nsclc name, you can name it anything you want.
Thank you so much for this helpful video. I have a question. At the last step that we detect doublets and we remove them how we could go back to the first step to do integration? no sure how to transfer the needed assay to the data.
You shall use "integrated" assay (if used CCA method to integrate), and move forward with the steps just how you would process data in 'RNA' slot of Seurat object.
@@Bioinformagician When I do DoubletFinder the integration still needs to be done. I mean after subsetting doublets from every individual sample, what approach I need to take. Should I move forward with subsetted samples and integrate. Thanks
Thank you for tutorial. I run pK Identification code, and then pK=0.2. The number of doublets is the same, but the shape of the graph is different. I wonder if I can move on to the next step or if I need to fix this issue. Thank you!
Did you use Strategies for pK optimization? Did you find your optimum pK to be 0.2?
Thanks thanks thanks a lot
Waiting for your metagenomics and metatranscriptomics one.
I will surely consider making a video on this in the near future :)
Could you please explain, How to assume this or this value is commonly expected ? -> Assuming 7.5% doublet formation rate
10X user guides provide expected multiplet rate for different protocols.
Here I have used the table on page 18 from the Chromium Next GEM Single Cell 3ʹ Reagent Kits v3.1 user guide (support.10xgenomics.com/single-cell-gene-expression/library-prep/doc/user-guide-chromium-single-cell-3-reagent-kits-user-guide-v31-chemistry) to get the doublet formation rate.
@@Bioinformagician But what if I had 10000 cells as input and approx 1100 recovered cells?🤔..Thanks really helpful channel😍
@@youvikasingh7955 How did you solve that issue? Thanks!
How would you filter out the doublets?
I think you can use subset():
pbmc.seurat.filtered
That's right! You can use subset() to filter out doublets.
@@Bioinformagician How do you do this when DF.classification_SOME VALUE is always changing? i.e. how do you filter out the doublets in a dynamic way?
I don't understand why in a dataset of 15000 real cells, a pN of 0,25 would represent the integration of 5000 artificial doublets... If anyone can solve my question...
Thank you!!!
Thanks for this workflow and shared the code. I have one issue when I run your code at the second last step.
> DimPlot(pbmc.seurat.filtered, reduction = 'umap', group.by = "DF.classifications_0.25_0.21_691")
Error in `[.data.frame`(data, , group) : undefined columns selected
In addition: Warning message:
The following requested variables were not found: DF.classifications_0.25_0.21_691
Could you please help to check it?
Thanks.
so we are only putting aside hetrotropic doublets not homotropic
What do you do when running 'bcmvn_pbmc
I am unable to answer why you get NULL at find.pK step as I cannot recreate this error.
Did you sort this out? I also get the same 'null' as I run this although my data is stored in this variable when I print it
@@rahmaqadeer9178 No, didn’t manage to fix this
I am also getting ' bcmvn_nsclc % select(pK)' my numeric value for the pK is 20
@@rahmaqadeer9178
The problem is that ParamSweep cannot find your normalized RNA counts.
Here’s how to fix it:
Instead of using "NormalizedData(sobj, normalization.method = "LogNormalize", scale.factor = 10000)"
Do the following:
"sobj