You videos are enormously helpful and beneficial. Thanks a ton for doing them. Also, please do keep them coming and you have keenly following and learning subscriber here. Thank you.
Thank you so much! I've just entered this field, and your clear walkthrough has been enormously helpful. You might be one of the best bioinformaticians I've ever met. I'm wondering if you could provide a walkthrough of the in silico perturbation function in Dynamo(I've been struggling for days). The CellOracle approach has worked perfectly in my research, as I've learned from your video. However, it can only be applied to transcription factors. Dynamo could be a complementary tool for knockout studies of other genes. It would be so much appreciated if you could help me with this 🙏
Thank you for your channel, which has been immensely helpful for my analysis tasks. I would like to consult with you about the Velocyto analysis. For instance, when using the command "velocyto run -b filtered_barcodes.tsv -o output_path -m repeat_msk_srt.gtf possorted_genome_bam.bam mm10_annotation.gtf", is it possible for me to use my own set of barcodes for analysis? Thank you very much!
Hi, While doing scVelo analysis on scanorama integrated data, is it correct to use scanorama embeddings (X_scanorama) instead of x_pca or x_umap to calculate the neighbours? Could this then be used for CellRank analysis? Any thoughts ? Thank you!
Thank you for this tutorial! It was extremely helpful with my analysis. I am having issues saving the adata file. Any ideas how I can save this so I don't have to keep running through the code..
Great video but I have to say the I am not sure I get the hype, it takes way longer to run and doesn't seem to give much more information than monocle3? Is it really worth it?
I haven't read the monocle paper and I don't use it myself, so I could be wrong. (I know a lot of people that like it though) But, one of the most important differences with scvelo is that it doesn't make the assumption that the lowest and highest spliced:nascent ratios are actually the highest/lowest in that population, theoretically giving you more biologically accurate predictions. Plus its in python xD
Hi Sam, I am wondering if we just use the loom file from BAM for preprocessing, clustering and annotating cell type? Because when I loaded the loom file with 'sc.read_loom' function of scanpy, it does also have a count layer. Do we really need to merge with the processed adata file like this video? Thank you alot.
Thats an interesting question. I haven't compared the loom counts to the actual counts before. I would say that to be safe just go ahead and use the "actual" counts from the 10x matrix. It may be fine, but it may not be as well.. might as well go with what is known to 100% work
Hi Sam, I have been trying to run scvelo on my 10X genomics datasets, but keep getting this error "UnboundLocalError: local variable 'id_length' referenced before assignment" after the "all_data_merged = scv.utils.merge(Neutro3p,VelNeutro3p)" step. I tried this with the files that were given in the 10X tutorial as well and keep getting the same error about id_length after trying to merge. Any help on this will be much appreciated! I have been stuck on this for 3 days now and was not able to figure it out. Thank you so much!
Thanks for a great tutorial, very helpful. I have a question regarding the data input for RNA velocity. I have a scRNAseq dataset that I have preprocessed in Seurat using SCTransform. However, I do not use these transformed counts for RNA velocity as I read it requires the raw data. So when I export my count matrix in Seurat I use the following command: counts_matrix
Hi, if you preprocessed in seurat don't worry about the preprocessing it in scanpy again. Just merge them directly after importing and you should get a warning that the counts are already processed. Let me know how it goes!
@@sanbomics Thanks for your quick reply! I've managed to run the scVelo work and as you said I get a warning saying the counts have already been processed. As these are just the raw counts exported from seurat (non-normalised), do I need to worry that these are non-normalised counts that I'm using for scVelo? Or does this not really matter as for scVelo I am really only looking at spliced vs unspliced (from the normalised loom files) ratios? Thanks!
scVelo does a normalization on the counts, if they aren't already normalized. It seems to me that you are exporting the normalized counts from seurat. This is fine and you should be getting that warning. Likewise in my tutorial I normalize prior to scVelo. I think what you are doing is fine!
Hi Sam, thanks for your tutorials! They have been very helpful for novice sc-genomics enthusiasts like me. I have a question about the velocyto 10x command: What is the file format you need in the SAMPLEFOLDER? I define my file path like you've suggested in your tutorial with cellranger outs folder- but I get this error: Invalid value for 'SAMPLEFOLDER': Directory '/...blah/outs/filtered_feature_bc_matrix' is not writable Any idea what I'm doing wrong?
Thanks much for the tutorial! Can I ask if there is a good way to run velocyto after cellranger aggregation, as cellranger aggr does not give you bam files? Or should I run velocyto on each channel separately and edit cell barcodes with suffix in the loom files? Thanks again and all your videos are super helpful!
I do have another question/request 😁 Do you have a tutorial on running scVelo after kallisto? I had initially generated velocity matrices using kallisto. Then I tried to do the scanpy preprocessing using one of your tutorials, but didn't succeed. The anndata file from kallisto output has only Ensembl gene IDs and that's set as the index- when I try to reset the index to gene names (which I imported as a separate df from pybiomart it fails...it's been a nightmare. And because of this I can't apply the standard filtering and QC with mitochondrial genes, etc. (Hope that makes sense?) I'm really stuck- and it seems like a silly data wrangling issue😢 Any help/suggestions would be much appreciated!
Hi, Thank you so much for the tutorial. This is really helpful! I am at the Jupyter Notebook stage and trying to run "import scanpy as sc". It is giving me this error although scanpy is installed. can you please help? Thank you so much! ModuleNotFoundError: No module named 'scanpy'
I can't say exactly. But here are the steps i recommend to install everything. 1)install miniconda 2)create new miniconda env python=3.8 or 3.9 3) activate miniconda env 4) install stuff with pip (make sure env is activated) 5) launch jupyter notebook (make sure env is activated) If you get an error after this, something weird is going on
you have many good tutorials and we don't always watch them in order. so when you reference data downloaded in a "previous tutorial" without showing us that data or at least linking us to the tutorial you used to get that data.. it sucks
If we have several loom files (e.g. we ran three lanes of sequencing of the same experiment, and generated a loom file for each), at which step would you merge the files? Would you combine the loom files, or generate adata files for each and then combine them?
Hi! Sorry for the late reply. You can merge the loom with each individual adata before integration. Splice/unspliced are saved in their own layer and wont be touched by integration. For example, if using scanorama: merge loom+adata, calculate scanorama embeddings, merge all adata files. But then for sc.tl.umap and scv.pp.moments make sure to use: use_rep = "X_scanorama"
Thank you for your channel, which has been immensely helpful for my analysis tasks. I would like to consult with you about the Velocyto analysis. For instance, when using the command "velocyto run -b filtered_barcodes.tsv -o output_path -m repeat_msk_srt.gtf possorted_genome_bam.bam mm10_annotation.gtf", is it possible for me to use my own set of barcodes for analysis? Thank you very much!
You videos are enormously helpful and beneficial. Thanks a ton for doing them. Also, please do keep them coming and you have keenly following and learning subscriber here. Thank you.
Thanks for subscribing! I hope to keep releasing at least one a week
This guy is my teacher, thanks a lot for your great effort I highly appreciate that.
Glad to! I appreciate the kind words!
Thanks a lot!Would you mind introducing how to do multi-sample scvelo analysis?
Thank you so much! I've just entered this field, and your clear walkthrough has been enormously helpful. You might be one of the best bioinformaticians I've ever met. I'm wondering if you could provide a walkthrough of the in silico perturbation function in Dynamo(I've been struggling for days). The CellOracle approach has worked perfectly in my research, as I've learned from your video. However, it can only be applied to transcription factors. Dynamo could be a complementary tool for knockout studies of other genes. It would be so much appreciated if you could help me with this 🙏
Thank you for your channel, which has been immensely helpful for my analysis tasks. I would like to consult with you about the Velocyto analysis. For instance, when using the command "velocyto run -b filtered_barcodes.tsv -o output_path -m repeat_msk_srt.gtf possorted_genome_bam.bam mm10_annotation.gtf", is it possible for me to use my own set of barcodes for analysis? Thank you very much!
Hi, informative videos. Very helpful for beginners with little or no coding experience.
Thank you! Glad you find them helpful!
@@sanbomics looking forward to see few STAT packages in future.
Hi,
While doing scVelo analysis on scanorama integrated data, is it correct to use scanorama embeddings (X_scanorama) instead of x_pca or x_umap to calculate the neighbours? Could this then be used for CellRank analysis?
Any thoughts ?
Thank you!
I've haven't tried this yet, but I think you would be right to use the scanorama embeddings
Thank you for the tutorial! Can you point to a tutorial starting at fastq files?
I actually have a video that does that already! th-cam.com/video/6heXkouNZpk/w-d-xo.html
How would you do this for non 10x data? Such as using Parse
Thank you for this tutorial! It was extremely helpful with my analysis. I am having issues saving the adata file. Any ideas how I can save this so I don't have to keep running through the code..
should be able to do adtata.write_h5ad('file_name.h5ad')
Great video but I have to say the I am not sure I get the hype, it takes way longer to run and doesn't seem to give much more information than monocle3? Is it really worth it?
I haven't read the monocle paper and I don't use it myself, so I could be wrong. (I know a lot of people that like it though) But, one of the most important differences with scvelo is that it doesn't make the assumption that the lowest and highest spliced:nascent ratios are actually the highest/lowest in that population, theoretically giving you more biologically accurate predictions. Plus its in python xD
Hi Sam, I am wondering if we just use the loom file from BAM for preprocessing, clustering and annotating cell type? Because when I loaded the loom file with 'sc.read_loom' function of scanpy, it does also have a count layer. Do we really need to merge with the processed adata file like this video? Thank you alot.
Thats an interesting question. I haven't compared the loom counts to the actual counts before. I would say that to be safe just go ahead and use the "actual" counts from the 10x matrix. It may be fine, but it may not be as well.. might as well go with what is known to 100% work
Hi Sam, I have been trying to run scvelo on my 10X genomics datasets, but keep getting this error "UnboundLocalError: local variable 'id_length' referenced before assignment" after the "all_data_merged = scv.utils.merge(Neutro3p,VelNeutro3p)" step. I tried this with the files that were given in the 10X tutorial as well and keep getting the same error about id_length after trying to merge. Any help on this will be much appreciated! I have been stuck on this for 3 days now and was not able to figure it out. Thank you so much!
Omg i got this error too ! Did anyone figure this out before ?
Thanks for a great tutorial, very helpful. I have a question regarding the data input for RNA velocity. I have a scRNAseq dataset that I have preprocessed in Seurat using SCTransform. However, I do not use these transformed counts for RNA velocity as I read it requires the raw data. So when I export my count matrix in Seurat I use the following command:
counts_matrix
Hi, if you preprocessed in seurat don't worry about the preprocessing it in scanpy again. Just merge them directly after importing and you should get a warning that the counts are already processed. Let me know how it goes!
@@sanbomics Thanks for your quick reply! I've managed to run the scVelo work and as you said I get a warning saying the counts have already been processed. As these are just the raw counts exported from seurat (non-normalised), do I need to worry that these are non-normalised counts that I'm using for scVelo? Or does this not really matter as for scVelo I am really only looking at spliced vs unspliced (from the normalised loom files) ratios? Thanks!
scVelo does a normalization on the counts, if they aren't already normalized. It seems to me that you are exporting the normalized counts from seurat. This is fine and you should be getting that warning. Likewise in my tutorial I normalize prior to scVelo. I think what you are doing is fine!
Hi Sam, thanks for your tutorials! They have been very helpful for novice sc-genomics enthusiasts like me.
I have a question about the velocyto 10x command:
What is the file format you need in the SAMPLEFOLDER? I define my file path like you've suggested in your tutorial with cellranger outs folder- but I get this error:
Invalid value for 'SAMPLEFOLDER': Directory '/...blah/outs/filtered_feature_bc_matrix' is not writable
Any idea what I'm doing wrong?
hmm, try running this on your "blah" folder: chmod -R 777 blah
other than that, just triple check you are using the right path to your sample folder. Let me know if you get it to work!
I didn't have permission- waiting for the person to chmod me in...
Thanks for getting back to me
Thanks much for the tutorial! Can I ask if there is a good way to run velocyto after cellranger aggregation, as cellranger aggr does not give you bam files? Or should I run velocyto on each channel separately and edit cell barcodes with suffix in the loom files? Thanks again and all your videos are super helpful!
No problem! Great question. What you suggest with doing them separately should work fine.
I do have another question/request 😁
Do you have a tutorial on running scVelo after kallisto?
I had initially generated velocity matrices using kallisto. Then I tried to do the scanpy preprocessing using one of your tutorials, but didn't succeed.
The anndata file from kallisto output has only Ensembl gene IDs and that's set as the index- when I try to reset the index to gene names (which I imported as a separate df from pybiomart it fails...it's been a nightmare. And because of this I can't apply the standard filtering and QC with mitochondrial genes, etc.
(Hope that makes sense?)
I'm really stuck- and it seems like a silly data wrangling issue😢
Any help/suggestions would be much appreciated!
Do you have a way to share your notebook? Github?
Hi, Thank you so much for the tutorial. This is really helpful!
I am at the Jupyter Notebook stage and trying to run "import scanpy as sc". It is giving me this error although scanpy is installed. can you please help? Thank you so much!
ModuleNotFoundError: No module named 'scanpy'
I can't say exactly. But here are the steps i recommend to install everything.
1)install miniconda
2)create new miniconda env python=3.8 or 3.9
3) activate miniconda env
4) install stuff with pip (make sure env is activated)
5) launch jupyter notebook (make sure env is activated)
If you get an error after this, something weird is going on
If I only got access to the CSV file from the cellranger output, can I create loom files from that?
By that you mean you only have the gene X cell count matrix? You need to be able to run veolcyto on the bam file to do velocity
@@sanbomics Yes, so I need to download the fastq files and run the cellranger pipeline right?
Yeah, you will need to rerun it unless you have access to the bam
Errors: ImportError: libcrypto.so.3: cannot open shared object file: No such file or directory
Weird, try this: stackoverflow.com/questions/54124906/openssl-error-while-loading-shared-libraries-libssl-so-3
@@sanbomics thanks. probably my server has no sudo authorization and the path is the problem.
where is the initial data?
scvelo has the dataset build in their dataset class. you can dowload it using sc.datasets or something like that
Mac or Linux?
I was running this on ubuntu. Might work on Mac too though, haven't tried
you have many good tutorials and we don't always watch them in order. so when you reference data downloaded in a "previous tutorial" without showing us that data or at least linking us to the tutorial you used to get that data.. it sucks
If we have several loom files (e.g. we ran three lanes of sequencing of the same experiment, and generated a loom file for each), at which step would you merge the files? Would you combine the loom files, or generate adata files for each and then combine them?
Hi! Sorry for the late reply. You can merge the loom with each individual adata before integration. Splice/unspliced are saved in their own layer and wont be touched by integration. For example, if using scanorama: merge loom+adata, calculate scanorama embeddings, merge all adata files. But then for sc.tl.umap and scv.pp.moments make sure to use: use_rep = "X_scanorama"
Thank you for your channel, which has been immensely helpful for my analysis tasks. I would like to consult with you about the Velocyto analysis. For instance, when using the command "velocyto run -b filtered_barcodes.tsv -o output_path -m repeat_msk_srt.gtf possorted_genome_bam.bam mm10_annotation.gtf", is it possible for me to use my own set of barcodes for analysis? Thank you very much!