Thank you so much for creating this series of videos. I'm learning tonnes because you explain everything so well and make it really accessible for beginners. Can't wait to watch more of the videos in the series.
I loved your video. This is best straightforward tutorial with essential information I could find online. Thankyou for sharing your skills and helping us all.
Had bookmarked it to watch it today. Totally worth it! Very nice step-by-step explanation to some standard analysis steps in scRNAseq. Thanks very much! Next, it would be nice to see some standard data-integration methods used for cell naming. Feel free to correct me but I guess Harmony is the one used often. Keep it up, Cheers.
Thank you for the kind words, I am glad you found this informative. Yes, Harmony is very commonly used for data integration and I shall create a video tutorial on that. Thanks for the suggestion :)
Thanks for the very informative video! But I sincerely hope that you could create a tutorial for annotation of different clusters, that would be very helpful! Appreciate your hardwork!
I am glad my videos have been helpful! When you say RNA seq column-wise interpretation, you mean to explain the structure of a Seurat object in more detail?
Great video thanks! I have a question, what's the next step? what do we do next to complete the single cell analysis after we have the different clusters? (What do we conclude from the clusters...) thank you
What made you choose a threshold of between 200 and 2500 features during the filtering step of QC? To me, the featureScatter plot shows the big plateau nearer to ~5000 features. Is this threshold (and % mt < 5) standard for scRNA seq? at 15:33
This is really fantastic for a beginner (after they learn how to install packages). I hope you've gotten lots of coffees. I would but I am not super keen on 3rd party sites. I wonder about just leaving paypal/venmo ID in description - lol - I'm not sure how safe a practice that is but I know I would be happier to just directly donate through sites I'm already tied to :/ maybe better haha
Just letting you know that the UMAP output you got in the console is via the R-native UWOT using the cosine metric. If I include the following umap.method = 'umap-learn', metric = 'correlation' in RunUMAP(), it gives me a very different output in the console. The R-native UWOT using the cosine metric gave me a DimPlot that is similar to yours but flipped horizontally while the Python UMAP via reticulate gave me a more dissimilar DimPlot and also flipped horizontally
Note that UMAP is not deterministic so the precise layout of the output differs run by run. What is always the same is some notion of topology therefore number of holes, clusters, etc let's say a sort of overall shape... don't know if this helps maybe I didn't get the point but I wanted to point this out for the community... in the case you change the metric well that highly changes the topology of the output
Great tutorials! I'm wondering why I keep encountering error saying "Error in match.arg(arg = layer, choices = Layers(object = object, search = FALSE)) : 'arg' should be one of “counts”, “data”, “scale.data” " when performing findvariablefeatures after normalization. Please instruct! Thank you!
Great Video explaining step-by-step of the analysis. I am wondering if you can make a video about single nuclei RNA seq analysis. Also as a beginner, I am having hard time understanding the various sequencing data formats in GEO datasets and how to convert some of the single cell sequence data generated by other methods, such as Drop-seq to be able used in Seurat.
Amazing and informative video helped a lot!! Thank you very much. Can you also make a video on how to analyze scRepertoire and scTranscriptome combined? Thank yoouu
Thank you so much for this tutorial. If I want to analyze the public data (SCTransform normalized data), Should I need to run all the procedure?, I created a seurat object but when I tried to do PCA, I all the time get an error that I missing normalization step. How can I start from normalized data?. Thank you so much
great tutorials! is Seurat the standard package for single-cell sequence analysis? My first taste on sc analysis was from on online course and I think it just used basic R commands without downloading any packages or libraries. I wonder what's the difference using these special packages and don't.
Thank you so much for your tutorials!! they are just AMAZING! Quick question, wich memory has your computer? I am working with a 16GB RAM (MacOS) and it gives me the following error when I reach the Scaling step: "Error: vector memory exhausted (limit reached?)". Any idea what can I do about this to make it run? I already tried to free up as much memory as I could from the RStudio session, but it is not enough... Thank you!!!
I have successfully analysed my very first scRNAseq dataset thanks to your video! I have a question. Now I'm tackling another huge scRNAseq dataset stored in HDF5. The count data is stored as data (non-zero elements), indices and indptr. I believe I have to reconstruct a sparse matrix from these parameters before I create seurat object. Could you orient me how to do it?
What does positive and negative correlation pca score mean ? How to interpret results from the dimplots obtained. What do you mean by explaining heterogeneity
big appreciate to your contributions, and I have a question about metadata of seurat object, in my seurat object it has col name of orig.ident, nCount_RNA, nFeature_RNA and something, in function of CreateSeuratObject, I understood project = "a" means assign a to all rows as original identity, and i wanted to add multiple ident to seurat object, currently, I assign my seurat object with cohort like disease or normal but, I also want to assign patient info to each object, how can I do that? thx for reply in advance
How can I take the batch effect corrected files for annotation? using the merged_dataset_filtered for annotation results in annotation and cluster identification of uncorrected data (not corrected for batch effects).
Very informative and helpful! Thank you. I would love to inquire what personal computer/laptop is suitable for this type of computational work to analyse single cell data in R. I came across facts that suggest the processor and ram should be put into consideration when getting a laptop. I look forward to a response. Thank you.
I recommend a macbook preferably a macbook pro with Apple M1 pro chip and 16GB RAM. In case if you are unable to get hold onto these specs, I would recommend getting access to a cluster. Renting AWS or google servers will serve as a blessing.
How should we represent repplicates from control and treated groups? people don't really provide seperate UMAP/t-sne plot for each replicates. At least I have not seen in the literature. However, this question was asked by some of the old PI's.
Could you maybe provide the order of your videos ? I want to learn scRNA-seq from scratch. I see you have multiple videos for this but I don't understand the order. Thanks!
@@Bioinformagician cannot wait for ATAC-Seq, CHIP-seq ,scATAC-Seq, scanpy+scanorama+MNN integration method , I suggest these topics , it looks interesting
Hi, i have a question When running Rstudio-server on Centos7, seurat and monocle3 packages are not installed. My guess is that the version is the problem. I've checked several sites for solutions, but haven't been able to fix it yet. Do you happen to know a workaround for package install? Same symptom on personal PC as well as server.
Thank you very much for the informative tutorial! Is it possible to manually filter two cell subsets based on the expression of a specific gene, then do differential gene expression analysis? For example, gene A did not come up as a marker of a cluster. Can we filter cells with high gene A expression vs cells with low gene A expression, then analyze differential gene expression between these two cell subsets? Thank you!
When you said gene A did not come up in top markers of a cluster, did you try playing around with the log.fc, min.pct thresholds? My next question would be what would you consider as "high" gene expression and what would be considered as "low"? Let's say even if you are capable to filter cells based on gene A's expression, how reliable will the differential expression results might be, considering we are using one gene's expression level to filter cells, losing potentially many genes that may not be expressed at the same level.
@@Bioinformagician Thank you for your reply! The idea is to filter two groups of cells (for example based on a cell surface marker), and analyze DE between the two cell groups. 1- playing around with log.fc, etc will still give multiple clusters of cells. 2-"high", and "low" is hypothetical and predetermined value. I figured out a code, and would to ask how to include the new cell identity in the metadata so that I can visualize DE after FindMarkers? #subsetting MIfibroblast.obj with "high" Postn gene exp PostnHigh.obj 3) # Change identity of cells in PostnHigh object PostnHigh.obj
I REALLY LOVE WATCHING YOUR VIDEOS, i am really having a challenge with this particular video. I have downloaded the file needed but I am not getting a similar response as you are getting while executing the code. wat could be the issue
Thank you for your videos. It helps us a lot. I have a quick question. In quality control chapter, you used the term no. of molecules. what does that mean?
Hello! I was curious for anyone following along with the dataset she choose, if you were running into issues with your final cluster map being a closely mirrored image of her map?
Hello! thanks so much for the video, it is so so helpful. Quick question! I was provided with 2 h5 files.. one with the feature matrix and a separate one with molecule info that has the mitochondrial data. How can I combine these both into a Seurat object / metadata table?
I am getting error while loading the dataset: Error in Read10X_h5(filename = "C:/Users/skp22/Desktop/RNAseq/20k_NSCLC_DTC_3p_nextgem_Multiplex_count_raw_feature_bc_matrix.h5") : could not find function "Read10X_h5" Can you please help me?
I wish you had shown how the scatter plot and the violin plot looked after filtering... Plateuing did not start before around 6.000 but you filtered from 2.500. Why?
Thanks a lot for starting this channel,these videos are really helpful. In future if possible could u please create tutorials where more than one of single cell gene exp. (Not multimodal but gemne exp itself)10x datasets are taken.Eg.there are various atlases which are created like brain atlases where they look at various brain regions in dif species cumulatively.So do they perform same quality control on all the datasets?or do they start from fastq and then do preprocessing or they take counts only?but dif scientists might have applied dif preprocessing to get count matrix? How do they bring all scrnaseq gene exp. dataset at the same level so that they can analyze ,u know like compare not the samples but the dataset like hippocampus of mouse gsexx and human gseyy but performed by dif scientists at dif time. So in short? How to decide whether to start from count matrix or fastq files? If I take various gse studies performed by dif scientists should I preprocess them all in the same manner so that i can compare them ? Where to start how yo proceed anc precautions? Sorry for the long questions.Looking forward to your answer and insight on these.And again thanks a lot for starting this and specially from basics.Loved it.
Thank you, I am glad you found these videos helpful! Coming to your questions... When trying to compare different studies, it makes sense to start from fastq files rather than count matrices. However, the following are some questions you should ask when trying to compare scRNA-Seq data from different studies: 1. Are the single-cell datasets you are trying to compare, from different sequencing platforms? 2. Do they sequence 3’ end, 5’ end, or full-length transcripts? Single-end or paired-end? 3. In case of 10X genomics, do the datasets have the same library type? What is the experimental design for these datasets? Talking about 10X datasets, depending on the experimental design, samples from different tissue type,s or time points, the Cell Ranger pipeline can be used to aggregate such datasets. I found a really nice paper that performed similar analysis to your question. They processed 20 scRNA-Seq datasets processed in multiple centers across different platforms from two biologically distinct cell lines. Here’s the link: www.nature.com/articles/s41597-021-00809-x I hope this helps and gives you some direction for your next steps. Good luck! :)
@@Bioinformagician Thanks a lot for answering and putting in the effort to also link a paper.Very helpful! Looking forward to more amazing videos and tutorials.All the best!
Hi Khushbu! So I tried running the command where I will be loading the NSCLC data on R.I am sure that I have given the right path while installation happened .But, for some reason , it throws an error out each time stating ,"Error in Read10X_h5 : File not found." and this is after I have installed the Read10X_h5 How do I resolve this issue?
Hi, thanks for the informative video! I have a question about QC filtering. How did you decide an upper limit of 2500 genes here. Because there are many cells that express more than 2500 that still fall under straight line. Just curious! Thank you!
I just went with the thresholds given in the Seurat's PBMC 3K tutorial. It is recommended to set the thresholds that makes more sense according to the data you have. So please feel free to deviate from the thresholds I have been using.
Several assumptions made when analyzing bulk RNA-seq data do not always apply in the context of scRNA-seq and hence methods like DESeq2 do not effectively account for the limitations specific to scRNA-seq data. I encourage you to read these articles - www.frontiersin.org/articles/10.3389/fgene.2020.00041/full www.ncbi.nlm.nih.gov/pmc/articles/PMC5549838/
Thanks for teaching us. I want to download some Pancreatic cancer sc-RNA seq data... can you provide some database link? Since I am very new in this field I was unable to get any database.
Have you tried looking up on GEO? There are a lot of single cell datasets available there. Also, look up for papers that study pancreatic cancers using single-cell RNA-Seq, you could get a lot of useful links from there as well.
Can someone please help me fix this? "Centering and scaling data matrix Error: cannot allocate vector of size 9.3 Gb" how can I fix this issue as I am using R 4.3.3 and this version doesn't support increasing memory allocation. I am using windows x86_ 64-w64-mingw32/x64 (64-bit)
Hi, I'm running your code on the same dataset as you and I bumped into an Error: vector memory exhausted (limit reached?). I'm working on a MacBook Pro 2017 with a 2.3GHz Dual-Core intel Core i5 with 8Gb of RAM. I'm assuming that either the processor or RAM simply aren't enough or could there be another issue? I'm aware that this data set is quite heavy. I see you're also woking on Mac, which one would you recommend or should I just move to a PC?
Just to complete the tutorial, use a small dataset. For your actual analyses, especially if you will integrate multiple samples/datasets, you will probably need access to an HPC.
Great tutorial, thanks a lot for this! I was wondering if you also have experience in analysing TCR repertoire data using Immunearch or other packages, and then its integration with gene expression data using scRepertoire/Platypus, then could you also please put tutorials on that ? Thanks again :)
Sometimes (not often), the counts matrix is provided as a .csv file (do not assume, make sure you confirm that with the authors or the ones who have generated that data). As long as you have the rows as genes, columns as cell barcodes, and values as counts, you can read it into a variable and use that to generate a Seurat object.
@@Bioinformagician I am having massive problems with analyzing a CSV file... Could you maybe do a similar video about how to get to analyze .csv in this way? It would be really great.
@@Bioinformagician Yes. So I want to datamine those results (GSM4306928) and I have troubles right from the beginning. This matrix has genes as rows, barcodes as columns, and values as counts. But When I create a Seurat object I cannot proceed any further. When I try to do the QC using mt genes, there is 0% everywhere. Feature plots spit out not genes, but weird numbers. As far as I understand this should not happen.
Hi. I was just trying to do this. But I see that my rstudio is using 49.8 GiB whereas at the same times your screen only shows 170MiB or so. Would you happen to know what I’m doing wrong.
Yes, you can label cell names on the UMAP. If you have a column in your metadata with annotations of which cell belong to which cell type, you can add those to UMAP by running: Idents(seurat.obj)
Hi, I would like to ask how can I create a Seurat Object that is from .txt file and how can I create a Seurat Object when I have the count table and cell information
Can this workflow be used for snRNASeq analysis. Can you please suggest me few websites where I can obtain raw snrna sequence data (preferably open source)
You can use the same pipeline for snRNA-Seq as well, the only difference being the obvious one - you should not expect to see mitochondrial counts since we have single nuclei and not single cells, theoretically. However, from my experience I have observed mitochondrial reads in single nuclei so do not skip this QC step while processing your data. You will find many single nuclei datasets here: www.10xgenomics.com/resources/datasets
When I install.packages("Seurat") it downloads fine but when I say library I got this error: > library(Seurat) Error: package or namespace load failed for ‘Seurat’ in loadNamespace(j
please never stop ! you are helping so many of us and you have no idea how thanks for such amazing content
Thank you so much for creating this series of videos. I'm learning tonnes because you explain everything so well and make it really accessible for beginners. Can't wait to watch more of the videos in the series.
I can't agree more.
Thanks a lot for this wonderful video!
I loved your video. This is best straightforward tutorial with essential information I could find online. Thankyou for sharing your skills and helping us all.
Such an amazing and straightforwad tutorial. You are soooo good at explaining the content. Thank you!!!!
Сердечное Вам спасибо и привет из России!
Пишу свою первую работу по single cell, Ваши ролики безумно помогают!
Had bookmarked it to watch it today. Totally worth it! Very nice step-by-step explanation to some standard analysis steps in scRNAseq. Thanks very much! Next, it would be nice to see some standard data-integration methods used for cell naming. Feel free to correct me but I guess Harmony is the one used often.
Keep it up, Cheers.
Thank you for the kind words, I am glad you found this informative. Yes, Harmony is very commonly used for data integration and I shall create a video tutorial on that. Thanks for the suggestion :)
These tutorials are honestly invaluable! Thank you!
Thanks for the very informative video! But I sincerely hope that you could create a tutorial for annotation of different clusters, that would be very helpful! Appreciate your hardwork!
Thank you for the suggestion, I have plans to make a video on cell annotation :)
Has anyone ever told you you're a hero!!!
thank you for your valuable information. please add more how to analysis RNA seq using r software
Thank you! It was a great tutorial, basic and simple to follow and great for beginners.
Very detailed and well-explained. Thank you!
U r doing wonderful job. Please make a video on RNA seq columns wise interpretations and what does that actual mean.
I am glad my videos have been helpful! When you say RNA seq column-wise interpretation, you mean to explain the structure of a Seurat object in more detail?
Thanks!
You are really a blessing for beginners.
It was useful for me
Lovely lady with beautiful presentation, thank you!
really appreciate the sharing of the knowledge
Thanks lots! you are creating great videos, you go to the point and the video is short.
Thank you so much for this tutorial. You are excellent.
Very informative and clarity is superb.
Great video thanks!
I have a question, what's the next step? what do we do next to complete the single cell analysis after we have the different clusters? (What do we conclude from the clusters...)
thank you
Hey, I really like the way you teach. Make more videos and all the best.
Thank you for this helpful tutorial!!
I love your videos! Thank you!
What made you choose a threshold of between 200 and 2500 features during the filtering step of QC? To me, the featureScatter plot shows the big plateau nearer to ~5000 features. Is this threshold (and % mt < 5) standard for scRNA seq? at 15:33
This is really fantastic for a beginner (after they learn how to install packages). I hope you've gotten lots of coffees. I would but I am not super keen on 3rd party sites. I wonder about just leaving paypal/venmo ID in description - lol - I'm not sure how safe a practice that is but I know I would be happier to just directly donate through sites I'm already tied to :/ maybe better haha
Thank you for your presentation. it was helpful!
This is AMAZING, thank you so much!
Really informative and totally worth watching!
This is amazing! I wonder which institution is the blogger in?
Just letting you know that the UMAP output you got in the console is via the R-native UWOT using the cosine metric. If I include the following umap.method = 'umap-learn', metric = 'correlation' in RunUMAP(), it gives me a very different output in the console. The R-native UWOT using the cosine metric gave me a DimPlot that is similar to yours but flipped horizontally while the Python UMAP via reticulate gave me a more dissimilar DimPlot and also flipped horizontally
Note that UMAP is not deterministic so the precise layout of the output differs run by run. What is always the same is some notion of topology therefore number of holes, clusters, etc let's say a sort of overall shape... don't know if this helps maybe I didn't get the point but I wanted to point this out for the community... in the case you change the metric well that highly changes the topology of the output
Great tutorials! I'm wondering why I keep encountering error saying "Error in match.arg(arg = layer, choices = Layers(object = object, search = FALSE)) :
'arg' should be one of “counts”, “data”, “scale.data” " when performing findvariablefeatures after normalization. Please instruct! Thank you!
Very helpful tutorial!
you are amazing! Thank you very much !! :)
Wow, great job! Could you send me each step-by-step process
Great Video explaining step-by-step of the analysis. I am wondering if you can make a video about single nuclei RNA seq analysis. Also as a beginner, I am having hard time understanding the various sequencing data formats in GEO datasets and how to convert some of the single cell sequence data generated by other methods, such as Drop-seq to be able used in Seurat.
Thank you for the suggestion. I will consider making a video using single nuclei data and various sequencing data formats on GEO.
Thanks a lot! It was very helpful.
That was awazing. Thanks so much!
Thank you for this video !!
Thanks a lot it's very useful for me....
Amazing and informative video helped a lot!! Thank you very much. Can you also make a video on how to analyze scRepertoire and scTranscriptome combined?
Thank yoouu
Very helpful, thanks!
Amazing! Thank you! :)
Thanks for your videos 😀
Wonderful tutorial education, thank you a lot🙏🌹 Is it possible make circRNA detection and circRNA-miR-mRNA network creation?
Thank you so much for this tutorial. If I want to analyze the public data (SCTransform normalized data), Should I need to run all the procedure?, I created a seurat object but when I tried to do PCA, I all the time get an error that I missing normalization step. How can I start from normalized data?. Thank you so much
really helpful, thanks!!
you are the best!!
Thank you so much!!! so so so helpful!!!!😭
Amazing work. Can you share a tutorial for single RNA-seq+ATAC seq analysis (multiome) ?
Definitely in the pipeline. Please stay tuned :)
great tutorials! is Seurat the standard package for single-cell sequence analysis? My first taste on sc analysis was from on online course and I think it just used basic R commands without downloading any packages or libraries. I wonder what's the difference using these special packages and don't.
Thank you so much for your tutorials!! they are just AMAZING! Quick question, wich memory has your computer? I am working with a 16GB RAM (MacOS) and it gives me the following error when I reach the Scaling step: "Error: vector memory exhausted (limit reached?)". Any idea what can I do about this to make it run? I already tried to free up as much memory as I could from the RStudio session, but it is not enough... Thank you!!!
Really thanks it is very interesting topic and helpful video, please can do video on Imputation of SC RNA seq data?
I will plan on making a video on it. Thanks for the suggestion :)
I have successfully analysed my very first scRNAseq dataset thanks to your video! I have a question. Now I'm tackling another huge scRNAseq dataset stored in HDF5. The count data is stored as data (non-zero elements), indices and indptr. I believe I have to reconstruct a sparse matrix from these parameters before I create seurat object. Could you orient me how to do it?
There is no link to the tutorial in the description 😮
Great Video
What does positive and negative correlation pca score mean ? How to interpret results from the dimplots obtained. What do you mean by explaining heterogeneity
big appreciate to your contributions, and I have a question about metadata of seurat object, in my seurat object it has col name of orig.ident, nCount_RNA, nFeature_RNA and something, in function of CreateSeuratObject, I understood project = "a" means assign a to all rows as original identity, and i wanted to add multiple ident to seurat object, currently, I assign my seurat object with cohort like disease or normal but, I also want to assign patient info to each object, how can I do that? thx for reply in advance
How can I take the batch effect corrected files for annotation? using the merged_dataset_filtered for annotation results in annotation and cluster identification of uncorrected data (not corrected for batch effects).
Very informative and helpful! Thank you.
I would love to inquire what personal computer/laptop is suitable for this type of computational work to analyse single cell data in R. I came across facts that suggest the processor and ram should be put into consideration when getting a laptop.
I look forward to a response. Thank you.
I recommend a macbook preferably a macbook pro with Apple M1 pro chip and 16GB RAM. In case if you are unable to get hold onto these specs, I would recommend getting access to a cluster. Renting AWS or google servers will serve as a blessing.
helped a lot!
That is so helpful
How should we represent repplicates from control and treated groups? people don't really provide seperate UMAP/t-sne plot for each replicates. At least I have not seen in the literature. However, this question was asked by some of the old PI's.
Thank you so much for your effort and your amazing way of explanation! Could you add the link to the Seurat tutorial website? thank you again!
Here you go: satijalab.org/seurat/articles/pbmc3k_tutorial.html
Could you maybe provide the order of your videos ? I want to learn scRNA-seq from scratch. I see you have multiple videos for this but I don't understand the order. Thanks!
can you make a video about the downstream analysis of ATAC-Seq data and scATAC-seq data?
I definitely have plans on covering topics associated with processing other multi-omics data in the near future. Please stay tuned :)
@@Bioinformagician cannot wait for ATAC-Seq, CHIP-seq ,scATAC-Seq, scanpy+scanorama+MNN integration method , I suggest these topics , it looks interesting
@@escastorage7427 Noted! Thanks for the suggestions.
Hi there! I'm trying to find a guide to create the count matrix using Cellranger or Starsolo. Any help?
Hi, i have a question
When running Rstudio-server on Centos7, seurat and monocle3 packages are not installed.
My guess is that the version is the problem. I've checked several sites for solutions, but haven't been able to fix it yet.
Do you happen to know a workaround for package install?
Same symptom on personal PC as well as server.
Thank you very much for the informative tutorial!
Is it possible to manually filter two cell subsets based on the expression of a specific gene, then do differential gene expression analysis?
For example, gene A did not come up as a marker of a cluster. Can we filter cells with high gene A expression vs cells with low gene A expression, then analyze differential gene expression between these two cell subsets?
Thank you!
When you said gene A did not come up in top markers of a cluster, did you try playing around with the log.fc, min.pct thresholds?
My next question would be what would you consider as "high" gene expression and what would be considered as "low"?
Let's say even if you are capable to filter cells based on gene A's expression, how reliable will the differential expression results might be, considering we are using one gene's expression level to filter cells, losing potentially many genes that may not be expressed at the same level.
@@Bioinformagician Thank you for your reply! The idea is to filter two groups of cells (for example based on a cell surface marker), and analyze DE between the two cell groups.
1- playing around with log.fc, etc will still give multiple clusters of cells.
2-"high", and "low" is hypothetical and predetermined value.
I figured out a code, and would to ask how to include the new cell identity in the metadata so that I can visualize DE after FindMarkers?
#subsetting MIfibroblast.obj with "high" Postn gene exp
PostnHigh.obj 3)
# Change identity of cells in PostnHigh object
PostnHigh.obj
You could save your new cell Idents as a column in metadata, then use that metadata column to visualize DE markers.
Postn.obj$new_idents
@@Bioinformagician Thank you so much! I applied your integration code and considered the two subsets as 2 samples for integration.
I REALLY LOVE WATCHING YOUR VIDEOS, i am really having a challenge with this particular video. I have downloaded the file needed but I am not getting a similar response as you are getting while executing the code. wat could be the issue
Thankyou very much
Hi - I only see the plot for the top10 variable genes when REPEL = FALSE instead of TRUE. Is this an issue? Thank you!
I have the same issue, not sure why
I was doing this scaling data but it is showing that no layers founf error in prepDR5 and scale data not found
what does the 'pattern =' function do in quality control?
Thank you for your videos. It helps us a lot. I have a quick question. In quality control chapter, you used the term no. of molecules. what does that mean?
Hello! I was curious for anyone following along with the dataset she choose, if you were running into issues with your final cluster map being a closely mirrored image of her map?
Me too
Hello! thanks so much for the video, it is so so helpful. Quick question! I was provided with 2 h5 files.. one with the feature matrix and a separate one with molecule info that has the mitochondrial data. How can I combine these both into a Seurat object / metadata table?
Please make a video regarding wgcna analysis
Thanks for the suggestion. I have plans on making a video on wgcna.
@@Bioinformagician 👍
I am getting error while loading the dataset:
Error in Read10X_h5(filename = "C:/Users/skp22/Desktop/RNAseq/20k_NSCLC_DTC_3p_nextgem_Multiplex_count_raw_feature_bc_matrix.h5") :
could not find function "Read10X_h5"
Can you please help me?
Thank you ma'am. I have just one quary. How can i download DEG for every cluster
I have spoken about that in one of my video - th-cam.com/video/1i6T9hpvwg0/w-d-xo.html
Hi! Can you help me to name the dots on the UMAP? (instead numbers the name of the genes) Thank you! Thank you very much!!!
Thank you !!
I wish you had shown how the scatter plot and the violin plot looked after filtering... Plateuing did not start before around 6.000 but you filtered from 2.500. Why?
Thanks a lot for starting this channel,these videos are really helpful.
In future if possible could u please create tutorials where more than one of single cell gene exp. (Not multimodal but gemne exp itself)10x datasets are taken.Eg.there are various atlases which are created like brain atlases where they look at various brain regions in dif species cumulatively.So do they perform same quality control on all the datasets?or do they start from fastq and then do preprocessing or they take counts only?but dif scientists might have applied dif preprocessing to get count matrix?
How do they bring all scrnaseq gene exp. dataset at the same level so that they can analyze ,u know like compare not the samples but the dataset like hippocampus of mouse gsexx and human gseyy but performed by dif scientists at dif time.
So in short?
How to decide whether to start from count matrix or fastq files?
If I take various gse studies performed by dif scientists should I preprocess them all in the same manner so that i can compare them ?
Where to start how yo proceed anc precautions?
Sorry for the long questions.Looking forward to your answer and insight on these.And again thanks a lot for starting this and specially from basics.Loved it.
Thank you, I am glad you found these videos helpful!
Coming to your questions...
When trying to compare different studies, it makes sense to start from fastq files rather than count matrices. However, the following are some questions you should ask when trying to compare scRNA-Seq data from different studies:
1. Are the single-cell datasets you are trying to compare, from different sequencing platforms?
2. Do they sequence 3’ end, 5’ end, or full-length transcripts? Single-end or paired-end?
3. In case of 10X genomics, do the datasets have the same library type? What is the experimental design for these datasets?
Talking about 10X datasets, depending on the experimental design, samples from different tissue type,s or time points, the Cell Ranger pipeline can be used to aggregate such datasets.
I found a really nice paper that performed similar analysis to your question. They processed 20 scRNA-Seq datasets processed in multiple centers across different platforms from two biologically distinct cell lines. Here’s the link: www.nature.com/articles/s41597-021-00809-x
I hope this helps and gives you some direction for your next steps. Good luck! :)
@@Bioinformagician Thanks a lot for answering and putting in the effort to also link a paper.Very helpful!
Looking forward to more amazing videos and tutorials.All the best!
Hi Khushbu! So I tried running the command where I will be loading the NSCLC data on R.I am sure that I have given the right path while installation happened .But, for some reason , it throws an error out each time stating ,"Error in Read10X_h5 :
File not found." and this is after I have installed the Read10X_h5 How do I resolve this issue?
Hi, thanks for the informative video! I have a question about QC filtering. How did you decide an upper limit of 2500 genes here. Because there are many cells that express more than 2500 that still fall under straight line. Just curious! Thank you!
I just went with the thresholds given in the Seurat's PBMC 3K tutorial. It is recommended to set the thresholds that makes more sense according to the data you have. So please feel free to deviate from the thresholds I have been using.
First of all, thank you so much for your content!
I have a question though - why didn't you use the DESeq2 normalization in the normalization part?
Several assumptions made when analyzing bulk RNA-seq data do not always apply in the context of scRNA-seq and hence methods like DESeq2 do not effectively account for the limitations specific to scRNA-seq data.
I encourage you to read these articles - www.frontiersin.org/articles/10.3389/fgene.2020.00041/full
www.ncbi.nlm.nih.gov/pmc/articles/PMC5549838/
Thanks for teaching us.
I want to download some Pancreatic cancer sc-RNA seq data... can you provide some database link? Since I am very new in this field I was unable to get any database.
Have you tried looking up on GEO? There are a lot of single cell datasets available there. Also, look up for papers that study pancreatic cancers using single-cell RNA-Seq, you could get a lot of useful links from there as well.
Can someone please help me fix this?
"Centering and scaling data matrix
Error: cannot allocate vector of size 9.3 Gb" how can I fix this issue as I am using R 4.3.3 and this version doesn't support increasing memory allocation. I am using windows x86_ 64-w64-mingw32/x64 (64-bit)
Hi, I'm running your code on the same dataset as you and I bumped into an Error: vector memory exhausted (limit reached?). I'm working on a MacBook Pro 2017 with a 2.3GHz Dual-Core intel Core i5 with 8Gb of RAM. I'm assuming that either the processor or RAM simply aren't enough or could there be another issue? I'm aware that this data set is quite heavy. I see you're also woking on Mac, which one would you recommend or should I just move to a PC?
Just to complete the tutorial, use a small dataset. For your actual analyses, especially if you will integrate multiple samples/datasets, you will probably need access to an HPC.
Great tutorial, thanks a lot for this! I was wondering if you also have experience in analysing TCR repertoire data using Immunearch or other packages, and then its integration with gene expression data using scRepertoire/Platypus, then could you also please put tutorials on that ? Thanks again :)
how to do with broad institute single-cell data? how to download the dataset and read it through it in r???
just wondering, what should i do if I got a csv data from the beginning(which is different from matrix)? Should i convert the csv data into matrix?
Sometimes (not often), the counts matrix is provided as a .csv file (do not assume, make sure you confirm that with the authors or the ones who have generated that data). As long as you have the rows as genes, columns as cell barcodes, and values as counts, you can read it into a variable and use that to generate a Seurat object.
@@Bioinformagician I am having massive problems with analyzing a CSV file... Could you maybe do a similar video about how to get to analyze .csv in this way? It would be really great.
@@kubaksiazkiewicz Can you elaborate on what problems you are encountering so I can plan on covering those issues? Thanks!
@@Bioinformagician Yes. So I want to datamine those results (GSM4306928) and I have troubles right from the beginning. This matrix has genes as rows, barcodes as columns, and values as counts. But When I create a Seurat object I cannot proceed any further. When I try to do the QC using mt genes, there is 0% everywhere. Feature plots spit out not genes, but weird numbers. As far as I understand this should not happen.
Hi. I was just trying to do this. But I see that my rstudio is using 49.8 GiB whereas at the same times your screen only shows 170MiB or so. Would you happen to know what I’m doing wrong.
Check the option to “free unused memory”
@@santiagoalvarez7536 Thanks will try that!
Hi! Is it possible to label the cell type name for the UMAP at the end? Please let me know! Thanks!
Yes, you can label cell names on the UMAP. If you have a column in your metadata with annotations of which cell belong to which cell type, you can add those to UMAP by running:
Idents(seurat.obj)
Hi, I would like to ask how can I create a Seurat Object that is from .txt file and how can I create a Seurat Object when I have the count table and cell information
Read the .txt file into an object and read that object into a seurat object like this - CreateSeuratObject(counts = txt_obj)
Can this workflow be used for snRNASeq analysis. Can you please suggest me few websites where I can obtain raw snrna sequence data (preferably open source)
You can use the same pipeline for snRNA-Seq as well, the only difference being the obvious one - you should not expect to see mitochondrial counts since we have single nuclei and not single cells, theoretically. However, from my experience I have observed mitochondrial reads in single nuclei so do not skip this QC step while processing your data.
You will find many single nuclei datasets here: www.10xgenomics.com/resources/datasets
Thanks a lot@@Bioinformagician
When I install.packages("Seurat") it downloads fine but when I say library I got this error:
> library(Seurat)
Error: package or namespace load failed for ‘Seurat’ in loadNamespace(j
Install SeuratObject first, install.packages("SeuratObject"). Once that is successfully installed, try install.packages("Seurat") again.
@@Bioinformagician okay thank you!