Thanks for the clear showcase. 1. The scRNAseq package has many single cell experiment but the annotation is not always available such as the MairPBMCData(). In such case, how to take use of the single cell dataset? 2. Let's say there is a publicly available single cell dataset with annotation. I want to use it as my reference. I can use either singleR or the FindTransferLabels from Seruat. Do you have opinions on one over the other?
Annotations are required unfortunately. I'm not sure I have a favored method over the other. I use both depending on the circumstance and both are dependent on the quality of the reference and how well it matches your heterogeneity.
@Sanbiomics Hi! Thanks for the video tutorial, it helped a lot. I checked it back in April when you uploaded, and now I am implementing it with my sc RNA seq analysis. This is the only video available on SingleR so far. I was wondering if you had another video or just code to show how to use any other publised dataset for proper referencing. Thanks in advance.
Hey! You should be able to use any dataset like I do in the later half of the video with lung_ref (as long as the dataset is annotated with the cell type).
Thank you for the great video. It is so helpful. I am working PBMC data and I tried three dataset from scRNAseq, but no one of them has annotation column. Any suggestions of a dataset as reference ?
Hmm, there should be some available by default that have the immune cells labeled. It may not have "pbmc" in the title specifically. But outside of singleR there are a lot of labled pbmc data. For example, the pbcm_3k data that everyone uses in their tutorials
You'll just have to pick the right reference dataset. There are a bunch of different options out there. But which reference you choose is by far the most important consideration. Wrong reference = wrong data
Thank you very much for this video. Do you know any other scRNA seq annotation data collections apart from "scRNAseq" and "TabulaMurisData" ? Unfortunately my cancer of interest is not included there.
Thank you for explaining an effective annotation strategy. I am working on mouse single-cell data and was wondering if I could use both Tabula and MCA reference datasets to compare the results (or will it be redundant) with the same strategy. There are many annotations tools available, did you had a good experience with other tools other than singleR. Again thank you very much for this video.
That is actually a good idea to do. Comparing the results from multiple datasets will allow you to catch possible mapping errors. Mapping can sometimes give you wrong results if the reference dataset does not correspond well to your dataset. SingleR is the only one I have used in R. In python I use scArches. But all reference mapping is only as good as the reference and how well it matches your dataset. So you always have to be careful.
Thank you for this automated process. The first method worked well, but when I ran the second method so I could make a comparison, I got an error message. Any idea/hint on how to solve this? results
Thank you so much for creating this video! Just a quick question but I was wondering what libraries you loaded in to, or if SingleR was the only one you needed to load in. I've been following your first method and found that I was unable to use certain functions like loading in the reference dataset with celldex, unless I had loaded in celldex with library(celldex). Thanks in advance!
No problem! Hmm.. its been a while so I'm not 100% sure without going back and checking. It's possible I had it loaded already in another notebook and made a mistake by not showing the import in the video/notebook on github.
@@sanbomics I loaded in celldex, but that seemed to have been the only other library that I needed to load in! Thanks for the response and for the well made video!
i wanted to ask a question if is there a way for automatic annotation like this in python scanpy, cause am struggling into doing annotation like i have a set of highlt expressed genes for each cluster i get but idk what is actually the next step of annotation , i tried comparing with mouse atlas dataset but its very general and my dataset is mesodermal lineage at specific timelapse ( embryo at 10 days ,) , so do u have any suggestion or tips on how to do annotation or if there is a knowledge am missing ? thanks again for your amazing video's its really super helpful
Yes there is a way and I actually already have a video for it: th-cam.com/video/tgk-rT_R4wk/w-d-xo.html BUT, automatic labeling is only as good as the reference and only works if the cell types in your reference match the cell types in your dataset. I wouldn't recommend doing this unless the reference is also mesodermal lineage at a similar time point. Labeling can be tricky and frustrating sometimes especially if there aren't other datasets from the same contexts. Check out panglodb, it might help for some too, but it might not be great for development.
@@sanbomics thanks alot ,For manual annotation, do i need to find and extract canonical markers per cluster for annotation or is there is there like a method or a Database i can give it my list of Differentially expressed gene for a specific cluster and it can check what cell type it might be..
Canonical markers can be nice for verification, but for some datasets/cell types you won't have them. If you do find a database or lists of genes potentially upregulated in that cell type, you can do something like scanpy.score_genes. Its better than looking at just one gene, which may or may not be present. You can also check your list of DE genes for overrepresentation for marker lists.
Dear Sanbomics, your lectures are great! Thanks for doing this. However, what was the reason you only cared for Droplet(ET1617) when you subset data from ExperimentHub?
Hi! Both likely would have worked. But my data were droplet so I wanted to keep it consistent. However, the most important consideration when picking a reference is one with high cell-type similarity to your data.
If they don't have brain you may have to use a different dataset. I think tabula muris senis has mouse brain data. You can find their figshare link and download the adata directly
Thank you! Btw, I did analyzing on oral cancer, when I try to make oral_ref (instead of lung_ref) I went with ‘Oral’, ‘Mouth’, ‘Head’, ‘Face’ but there was no ref. Could you suggest me other ways to label my cells?
Cancer is going to be inherently hard to label and I am not sure I would trust these kind of methods to label them correctly. You will likely have to take a more manual approach
Hello I am a graduate student working with Single cell Data. I attempted to run the code with my data. Unfortunately I receive an error message every time I try to run either the built in reference or with another dataset from ExperimentHub. I receive these error messages Error in validityMethod(as(object, superClass)) : object 'CsparseMatrix_validate' not found and reason: object 'CsparseMatrix_validate' not found' Can you help me or give me advice on how to overcome these errors. I would greatly appreciate it.
If there are only a few outlier cells that are labeled as a random cell type when >95% in the cluster are labeled as the same thing you can 1) remove those cells 2) or label them what the other 95% are. Instead, if the labeling in a cluster is very mixed (eg, no labeling above >50%), it likely means you need to use a different reference.
Hi, do you mind providing just a bit more context? When you say you ended up with 0 cells what do you mean? Is that the reference dataset you were trying to use?
Oops. I forgot to update it after making some changes on github. Thanks for letting me know github.com/mousepixels/sanbomics_scripts/blob/main/single_r.Rmd
Hi. Sorry, but these are my unpublished data. Any sample processed by 10x cellranger will have the filtered_feature_bc_matrix directory you can open up similarly.
Thanks for the clear showcase. 1. The scRNAseq package has many single cell experiment but the annotation is not always available such as the MairPBMCData(). In such case, how to take use of the single cell dataset? 2. Let's say there is a publicly available single cell dataset with annotation. I want to use it as my reference. I can use either singleR or the FindTransferLabels from Seruat. Do you have opinions on one over the other?
Annotations are required unfortunately. I'm not sure I have a favored method over the other. I use both depending on the circumstance and both are dependent on the quality of the reference and how well it matches your heterogeneity.
@@sanbomics Thanks a lot for sharing your experience!
@Sanbiomics Hi! Thanks for the video tutorial, it helped a lot. I checked it back in April when you uploaded, and now I am implementing it with my sc RNA seq analysis. This is the only video available on SingleR so far. I was wondering if you had another video or just code to show how to use any other publised dataset for proper referencing. Thanks in advance.
Hey! You should be able to use any dataset like I do in the later half of the video with lung_ref (as long as the dataset is annotated with the cell type).
Thank you for the great video. It is so helpful.
I am working PBMC data and I tried three dataset from scRNAseq, but no one of them has annotation column. Any suggestions of a dataset as reference ?
Hmm, there should be some available by default that have the immune cells labeled. It may not have "pbmc" in the title specifically. But outside of singleR there are a lot of labled pbmc data. For example, the pbcm_3k data that everyone uses in their tutorials
life-saving
Thanks for the great explanation. Any suggestions for mouse or human immune cell reference?
You'll just have to pick the right reference dataset. There are a bunch of different options out there. But which reference you choose is by far the most important consideration. Wrong reference = wrong data
This is very cool. Thank you very much for sharing! :)
Glad you liked it!
Thank you very much for this video. Do you know any other scRNA seq annotation data collections apart from "scRNAseq" and "TabulaMurisData" ? Unfortunately my cancer of interest is not included there.
Thank you for explaining an effective annotation strategy. I am working on mouse single-cell data and was wondering if I could use both Tabula and MCA reference datasets to compare the results (or will it be redundant) with the same strategy. There are many annotations tools available, did you had a good experience with other tools other than singleR. Again thank you very much for this video.
That is actually a good idea to do. Comparing the results from multiple datasets will allow you to catch possible mapping errors. Mapping can sometimes give you wrong results if the reference dataset does not correspond well to your dataset.
SingleR is the only one I have used in R. In python I use scArches. But all reference mapping is only as good as the reference and how well it matches your dataset. So you always have to be careful.
I really loved this one! Thank you so so much
Glad you liked it!
Thank you for this automated process. The first method worked well, but when I ran the second method so I could make a comparison, I got an error message. Any idea/hint on how to solve this?
results
Thank you so much for creating this video! Just a quick question but I was wondering what libraries you loaded in to, or if SingleR was the only one you needed to load in.
I've been following your first method and found that I was unable to use certain functions like loading in the reference dataset with celldex, unless I had loaded in celldex with library(celldex). Thanks in advance!
No problem! Hmm.. its been a while so I'm not 100% sure without going back and checking. It's possible I had it loaded already in another notebook and made a mistake by not showing the import in the video/notebook on github.
@@sanbomics I loaded in celldex, but that seemed to have been the only other library that I needed to load in! Thanks for the response and for the well made video!
What dataset did you use for the first example? The one that you prepped as Lung1/outs/filtered_feature_bc_matrix.
Hi, it's an unpublished dataset. Sorry! If you are interested I will share the GEO in the description when it is published.
@@sanbomics That would be great! Thank you, this video was very helpful
i wanted to ask a question if is there a way for automatic annotation like this in python scanpy, cause am struggling into doing annotation like i have a set of highlt expressed genes for each cluster i get but idk what is actually the next step of annotation , i tried comparing with mouse atlas dataset but its very general and my dataset is mesodermal lineage at specific timelapse ( embryo at 10 days ,) , so do u have any suggestion or tips on how to do annotation or if there is a knowledge am missing ? thanks again for your amazing video's its really super helpful
Yes there is a way and I actually already have a video for it: th-cam.com/video/tgk-rT_R4wk/w-d-xo.html
BUT, automatic labeling is only as good as the reference and only works if the cell types in your reference match the cell types in your dataset. I wouldn't recommend doing this unless the reference is also mesodermal lineage at a similar time point.
Labeling can be tricky and frustrating sometimes especially if there aren't other datasets from the same contexts. Check out panglodb, it might help for some too, but it might not be great for development.
@@sanbomics thanks alot ,For manual annotation, do i need to find and extract canonical markers per cluster for annotation or is there is there like a method or a Database i can give it my list of Differentially expressed gene for a specific cluster and it can check what cell type it might be..
Canonical markers can be nice for verification, but for some datasets/cell types you won't have them. If you do find a database or lists of genes potentially upregulated in that cell type, you can do something like scanpy.score_genes. Its better than looking at just one gene, which may or may not be present. You can also check your list of DE genes for overrepresentation for marker lists.
That's pretty good
thanks!
Thank you so much for the great tutorial. I keep getting this error which is preventing me from going forward:
> results
Great!
Thanks!
@@sanbomics NO.NO. Thank YOU!!!)) May I ask you to prepare a video about IGV, if possible?
Hi, at 2:22, when I ran `ref
Try installing and loading in celldex: bioconductor.org/packages/release/data/experiment/html/celldex.html
You're awesome!
Thanks :)
Dear Sanbomics, your lectures are great! Thanks for doing this. However, what was the reason you only cared for Droplet(ET1617) when you subset data from ExperimentHub?
Hi! Both likely would have worked. But my data were droplet so I wanted to keep it consistent. However, the most important consideration when picking a reference is one with high cell-type similarity to your data.
@@sanbomics however I wanted to filter brain tissue instead of lung and brain tissue seems missing in those annotation files. Do you have any idea?
If they don't have brain you may have to use a different dataset. I think tabula muris senis has mouse brain data. You can find their figshare link and download the adata directly
Hello, thank you for your video. If my data is from 10x, can I still use the droplet or smartseq2 ref?
Yup, this method should work across technologies. You can even use bulk data.
Thank you very much!!!!!!
You're welcome!
Thank you! Btw, I did analyzing on oral cancer, when I try to make oral_ref (instead of lung_ref) I went with ‘Oral’, ‘Mouth’, ‘Head’, ‘Face’ but there was no ref. Could you suggest me other ways to label my cells?
Cancer is going to be inherently hard to label and I am not sure I would trust these kind of methods to label them correctly. You will likely have to take a more manual approach
@@sanbomics Thank you!
Hello I am a graduate student working with Single cell Data. I attempted to run the code with my data. Unfortunately I receive an error message every time I try to run either the built in reference or with another dataset from ExperimentHub. I receive these error messages Error in validityMethod(as(object, superClass)) :
object 'CsparseMatrix_validate' not found and reason: object 'CsparseMatrix_validate' not found' Can you help me or give me advice on how to overcome these errors. I would greatly appreciate it.
I use SingleR for annotaiton. I only have 5 clusters, but the annotation cell types are up to 14. How to just do the annotation for the clusters?
If there are only a few outlier cells that are labeled as a random cell type when >95% in the cluster are labeled as the same thing you can 1) remove those cells 2) or label them what the other 95% are. Instead, if the labeling in a cluster is very mixed (eg, no labeling above >50%), it likely means you need to use a different reference.
I need your help! I followed your process but for mammary gland tissue. I ended up with 0 cells. What could I be doing wrong???
Hi, do you mind providing just a bit more context? When you say you ended up with 0 cells what do you mean? Is that the reference dataset you were trying to use?
@Sanbomics Thanks for the video. Unfortunately notebook link is not working anymore. Could you please provide the code?
Oops. I forgot to update it after making some changes on github. Thanks for letting me know
github.com/mousepixels/sanbomics_scripts/blob/main/single_r.Rmd
@@sanbomics Thank you.
Where can I find the data Lung1/outs/filtered_feature_bc_matrix?
Hi. Sorry, but these are my unpublished data. Any sample processed by 10x cellranger will have the filtered_feature_bc_matrix directory you can open up similarly.
Hi Mark. If I want to follow along, how can I fix this:
> data
You just need to change "Lung1/" to the path of your directory
@@sanbomics So I need to have my own data, is that correct? Thank you.
do you know any reference scRNA datasets for zebrafish?
I would be surprised if there were none.. but I don't know any of the top of my head because I never work with them, sorry!