How to manipulate gene expression data from NCBI GEO in R using dplyr | Bioinformatics for beginners

แชร์
ฝัง
  • เผยแพร่เมื่อ 6 ม.ค. 2025

ความคิดเห็น • 104

  • @danielajbq
    @danielajbq 2 ปีที่แล้ว +25

    youre an ANGEL for making these. I am doing my MS in bioinformatics right now and this is genuinely better than some of my courses. Thank you!!

    • @MichealIdedia
      @MichealIdedia 7 หลายเดือนก่อน

      Hello, are you done with your Msc now?

  • @mayank9986
    @mayank9986 ปีที่แล้ว +2

    I am new to programming. I was looking for help to analyse RNAseq data and your video just came as a blessing. Thank you a ton.

  • @sanjaisrao484
    @sanjaisrao484 2 ปีที่แล้ว +6

    Excellent explanation, Thanks for teaching the basics of R, It was extremely helpful, please continue to make more videos

  • @amitrupani9898
    @amitrupani9898 3 ปีที่แล้ว +10

    Thank you for this very helpful video! I have recently moved from a clinical genetics laboratory to a research laboratory where pipelines are written in R and they extensively leverage the capabilities of dplyr library. So, I needed a tutorial to help me understand its basic functioning. This helped. Keep up the good work you are doing through this channel. Cheers!!

    • @Bioinformagician
      @Bioinformagician  3 ปีที่แล้ว +1

      I am really glad this helped you get a basic understanding of dplyr package. Thank you for your kind words, encourages me to do more of this! ☺️

  • @eylulozerbil8548
    @eylulozerbil8548 ปีที่แล้ว +1

    This tutorial encouraged me to continue my R learning process by showing me how I can manipulate these kind of datas in the simplest way! thank you bioinformagician :)

  • @mahshidpooladvand8502
    @mahshidpooladvand8502 5 หลายเดือนก่อน

    This was the best tutorial I could possibly find online!!! You are incredibly smart! Thanks!

  • @Radslom
    @Radslom ปีที่แล้ว +1

    This video was extremely helpful for me. I am currently learning how to use R and GEO2, and this video helped to clarify it. Thank you and keep up the great work!

  • @muyyy9000
    @muyyy9000 ปีที่แล้ว +2

    Thank you so much for making content like this. It's extremely helpful for beginners like me trying to analyze gene expression data on Rstudio.

  • @zlj8435
    @zlj8435 2 ปีที่แล้ว +1

    Thank you for this wonderful course! I am a year 1 PhD student and it really helps me a lot!

  • @Grzegorz-f1b
    @Grzegorz-f1b 7 หลายเดือนก่อน

    Thank You my new teacher I work actually about that biogenetics in IT and C++ this video helps me very much ❤️🙏👌

  • @syedmansoorjan2671
    @syedmansoorjan2671 3 ปีที่แล้ว +1

    Amazing, don't have words to say for you.. try to share more... I just found this very helpful...

  • @aishaa812
    @aishaa812 6 หลายเดือนก่อน

    Thank you. Its extremely helpful for me since I am a beginner in R studio and I am trying to apply data analysis in R studio.

  • @jessicus
    @jessicus 3 หลายเดือนก่อน

    THANK YOU SO MUCH!!! I'm doing undergrad cancer research right now and I've been looking for a way so that I can analyze an expression matrix/transcriptomic data in R.

  • @mocabeentrill
    @mocabeentrill ปีที่แล้ว

    Thank you. You're really good at what you do. I did tis in base R and oh my word, it looks grotesque!

  • @Ojaswini-Pathak
    @Ojaswini-Pathak 2 ปีที่แล้ว +1

    Very well made video and your understanding of the subject is tremendous!

  • @claudiocesarmontenegrojuni5141
    @claudiocesarmontenegrojuni5141 ปีที่แล้ว +1

    You're amazing teacher! Thank you so much for this outstanding content.

  • @Bunga-p5i
    @Bunga-p5i ปีที่แล้ว

    Thank you for the great tutorial! Just to let you know, I had to download these packages first to perform your script.
    install.packages("dplyr")
    install.packages("tidyverse")
    if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
    BiocManager::install("GEOquery")

  • @mikewafula9470
    @mikewafula9470 2 ปีที่แล้ว

    Thanks so much for this great video. You have made it easy for me to explore gene data analysis with R. Keep sharing such content. Cheers!!

  • @traveller_of_lyf
    @traveller_of_lyf 10 วันที่ผ่านมา

    3:34 can you please tell how you saved demo as your working directory? How those options appeared

  • @hemanthchenga5671
    @hemanthchenga5671 2 ปีที่แล้ว

    Thanks for explaining the code in detail and please make more videos

  • @seungwonkim8359
    @seungwonkim8359 ปีที่แล้ว

    Really helpful! Thank you very much. I hope you continue these marvelous work for long, since I am working on bulk/single cell RNA seq these days.

  • @余长
    @余长 ปีที่แล้ว

    Very helpful and you are very patient. It seems that you know exactly what my questions are.

  • @jammerkd
    @jammerkd 2 ปีที่แล้ว +1

    Excellent videos and you are a fantastic teacher

  • @rajanirao6011
    @rajanirao6011 3 ปีที่แล้ว +1

    These videos are so good!!! Good practise to learn R. Thank you!

  • @mohammeddabbour2254
    @mohammeddabbour2254 2 ปีที่แล้ว

    Wonderful explanation. Thank you so much for making this tutorial. Just a sidenote: when both dplyr and plyr (from tidyverse) packages are loaded and you want to use a certain function, it is better to specify the package the function is available in when calling the function (such as: dplyr::rename()). Otherwise, R may mistakenly think you are trying to use the function in the plyr package and return an error. Happy coding!

    • @Bioinformagician
      @Bioinformagician  2 ปีที่แล้ว

      Correct, thanks for pointing it out. Have taken care of that in the videos following this video :)

  • @karthibiotech426
    @karthibiotech426 2 ปีที่แล้ว

    Wow.. its very helpful I am just practicing with another dataset..with your same protocol... Thanks a lot...

  • @setarehsohail5422
    @setarehsohail5422 2 ปีที่แล้ว

    Amazing!! You are a professional teacher!! Thanks!

  • @cerenuzun5989
    @cerenuzun5989 3 ปีที่แล้ว

    It was very helpful and it would be great if you continue these tutorials. Thank you so much!!

    • @Bioinformagician
      @Bioinformagician  2 ปีที่แล้ว

      I am glad you find my videos helpful! :)

  • @BISMILLAH7334
    @BISMILLAH7334 2 ปีที่แล้ว

    Excellent ! Thank you for the tutorial . Looking forward to many more such useful tutorials

  • @ayobamiogunsola6139
    @ayobamiogunsola6139 ปีที่แล้ว

    Thank you for making this video. It has been helpful.

  • @jithus89
    @jithus89 11 หลายเดือนก่อน +1

    > gse = GEOquery::getGEO(GEO = 'GSE183947', GSEMatrix = TRUE)
    Error in open.connection(x, "rb") :
    Problem with the SSL CA cert (path? access rights?) why this error?

  • @Saed7630
    @Saed7630 ปีที่แล้ว

    Clean, clear and informative!

  • @Scienceforall2020
    @Scienceforall2020 8 วันที่ผ่านมา

    Hi I am at beginner level. I was trying to follow the method. I had issues while reshaping the data file shown in the video at 23:32. While I try to use the command "gather". And then selecting 3 variables. After this step it is generating a 6x3 . Only 6 genes and 3 variables. I am not able to see other genes? Could not find the mistake. Could you help ?

    • @fcbrvm25
      @fcbrvm25 2 วันที่ผ่านมา +1

      you probably left head() at the end of the pipeline, remove it and all 1214760 rows should appear

  • @o1kun
    @o1kun 2 ปีที่แล้ว

    Your video really helped me!! Really appreciate it😊

  • @MohammadNasirAbdullah
    @MohammadNasirAbdullah 10 หลายเดือนก่อน

    Thank you so much, it really helps me
    😊😊😊😊😊😊😊😊

  • @ritobratasengupta796
    @ritobratasengupta796 หลายเดือนก่อน

    Thank you for your tutorials. My project is on gene expression analysis using python. Can you please make a tutorial on it? It will be really helpful to learn from you.

  • @batoolalhajali3554
    @batoolalhajali3554 3 วันที่ผ่านมา

    hello! great video thank you. i was wondering why when you set the path for read.cvs you wrote ../data/GSE12345.csv and you didnt write the full path starting with desktop? is it because the script is saved in the same file? im sorry if this is a naive Q, i havent even touched bioinformatics before. Thanks

  • @xelaldaero9339
    @xelaldaero9339 ปีที่แล้ว

    Thank you! Your videos are very useful!

  • @gaurangagarwal3817
    @gaurangagarwal3817 4 หลายเดือนก่อน

    Hey! could u help me in finding the differential gene expression level from a gene omnibus dataset through R Limma package

  • @moulytasnuva1860
    @moulytasnuva1860 2 ปีที่แล้ว

    @Bioinformagician Is there any process to find the threshold value from FPKM to compare the early and late stages of cancer?

  • @melinaguillon2449
    @melinaguillon2449 6 หลายเดือนก่อน +1

    Hi! I can't install GEOquery, I get this error message: Warning in install.packages :
    package ‘GEOquery’ is not available for this version of R

  • @sayeman9577
    @sayeman9577 ปีที่แล้ว +1

    Thanks! Very helpful

  • @mikewafula9470
    @mikewafula9470 ปีที่แล้ว

    Thanks again for the video. I have managed to download the gene expression data (GSE 216497). How do I get its corresponding metadata.

  • @Aishwarya-p4w
    @Aishwarya-p4w 3 หลายเดือนก่อน

    I have knowledge of basic R and Python packages and I love these tutorials as I wanted to start with NGS analysis and never knew where to or how to.
    I do have a question, if I use a different dataset or the same data and use the different pipelines used in this playlist, can I upload it on GIt and have a mini project of my own? Is that okay to do? Or do I need to modify it to consider it as a project?

  • @yahyayozbatiran
    @yahyayozbatiran 2 ปีที่แล้ว

    Hello, how can i plot a specific gene expression in cancer subtypes from tcga, for example;
    I want to plot> MSH2 gene expressions in Colon Mucinous versus Colon Adenocarcinoma

  • @faizu0076
    @faizu0076 ปีที่แล้ว

    I didnt founr getGEO protein query in this there is no any package support with this name solve rhe problem plz

  • @chinspostdoc
    @chinspostdoc 2 ปีที่แล้ว

    HI have some questions. Please help to resolve the or to understand them. What if the GEO study only gives us a raw file containing either text files, or . CEL files. how to read the data from that. 2) suppose if a GEO study contain many samples of different tissues, then how to make 2 groups comprising on only those samples that a person is interested e.g. as i want to compare expression data from healthy and covid patients but GEO study contain some samples of ell lines treated with a certain chemical along with tissues of healthy and covid patients. Then how can i make two group with heathy and covid name and also includes samples into those groups accordingly. 3) If GEO raw file contain count.text files of each sample then how we can use them for differential expression analysis. Your kind reply would be much appreciated.

  • @Ijazalijin
    @Ijazalijin 2 ปีที่แล้ว +1

    how can is activate the GEOquery packge??

    • @Bioinformagician
      @Bioinformagician  2 ปีที่แล้ว +1

      Run library(GEOquery) at the beginning of the script

  • @alaminafendy6071
    @alaminafendy6071 ปีที่แล้ว

    Thank you so much. Nicely explain..

  • @sharadjaiswal1705
    @sharadjaiswal1705 ปีที่แล้ว

    Ma'am how to write R script. that are used in this video?

  • @gustavoantoniobrugesmorale1881
    @gustavoantoniobrugesmorale1881 2 ปีที่แล้ว

    You are excellent. Thank you!!!

  • @lisahuang850
    @lisahuang850 2 ปีที่แล้ว +1

    Really nice video! I was wondering if you could demonstrate how to convert the raw count to tpm or fpkm values in r as my GSE dataset provide raw count. Thanks!

    • @Bioinformagician
      @Bioinformagician  2 ปีที่แล้ว

      Thanks for the suggestion. Will plan a video covering this!

  • @aytacoksuzoglu2975
    @aytacoksuzoglu2975 ปีที่แล้ว

    why did we put -> .

  • @markrenton6981
    @markrenton6981 ปีที่แล้ว

    Can someone please explain what the two ".." are at the start of her file path when reading in the data file?

    • @Bioinformagician
      @Bioinformagician  ปีที่แล้ว

      The "../" is the Linux notation to move up a directory level in the file system hierarchy. For instance, if you're in the directory "/home/user/documents/" and you use "../", you'll move up to the "/home/user/" directory.

  • @juliangrandvallet5359
    @juliangrandvallet5359 2 ปีที่แล้ว

    Amazing!!!! now how can I plot a heatmap out of this data?

  • @Ojaswini-Pathak
    @Ojaswini-Pathak 2 ปีที่แล้ว +1

    Hi, I tried installing GEOquery package and got error - package GEOquery is not available for this version of R, could you please help.

  • @tushardhyani3931
    @tushardhyani3931 2 ปีที่แล้ว

    Thank you for this video !!

  • @andyderek3021
    @andyderek3021 2 ปีที่แล้ว +1

    Thank you for this well explained video. Please, if i want to do survival analysis based on gene expression data with lets say GE183947, how can i get the clinical data information from GEO ?

    • @Bioinformagician
      @Bioinformagician  2 ปีที่แล้ว

      If it is not provided with the metadata, you might have to reach out to the authors.

  • @muneeramashkoor7919
    @muneeramashkoor7919 2 ปีที่แล้ว

    Hello, your videos are very informative. I am trying to look at the gene expression of my gene of interest. The supplementary data in GEO is in the form of a .fpkm_tracking file. How can I go about solving/looking at the expression using these files? Thank you!

    • @Bioinformagician
      @Bioinformagician  2 ปีที่แล้ว

      If there are no raw counts provided, you can create them yourself. You can fetch RNA-Seq reads associated with GEO dataset from SRA. Once you get the reads, you can align and quantify them to get counts.

  • @harshjasani8637
    @harshjasani8637 2 ปีที่แล้ว

    Hello, Thank you for amazing video and tutorials. I could not load the GEOquery library, any ideas what could be the reason?

    • @Bioinformagician
      @Bioinformagician  2 ปีที่แล้ว

      probably you need to install it first before loading?

  • @awa8061
    @awa8061 2 ปีที่แล้ว +1

    can you suggest any python package for gene expression analysis?

    • @Bioinformagician
      @Bioinformagician  2 ปีที่แล้ว +1

      Unfortunately, I do not have any recommendations for python packages. I only use R for gene expression analysis.

  • @mohamedalfaki4268
    @mohamedalfaki4268 2 ปีที่แล้ว +1

    Hi and thanks for this very nice tutorial, I have this error when I am trying to reshape the data
    Error in `stop_formula()`:
    ! Formula shorthand must be wrapped in `where()`.
    # Bad
    data %>% select(~gene)
    # Good
    data %>% select(where(~gene))

    • @Bioinformagician
      @Bioinformagician  2 ปีที่แล้ว

      Can you give me a little context of what you are trying to do? I am having a hard time recreating this error. Thanks!

  • @zeynepdurkaya883
    @zeynepdurkaya883 ปีที่แล้ว

    ı cant command call the data the chapter 6.14 isnt clear enough

  • @aheedan9957
    @aheedan9957 2 ปีที่แล้ว

    Hi, nice one, but I did not understand the part of pData and phenodata function.

  • @IslamSafwat--
    @IslamSafwat-- 8 หลายเดือนก่อน

    GREAT! many thanks::)

  • @imvasco
    @imvasco 2 ปีที่แล้ว +2

    What about GEO data thats not CSV but TXT?

    • @Bioinformagician
      @Bioinformagician  2 ปีที่แล้ว +1

      Sometimes gene expression data is also available as a .txt file on GEO. You could read in .txt similar to how you read a .csv file in R. Please make sure .txt file contains gene expression data. Usually, the 'data processing' section for each sample should provide details on what does the txt file contains and how it is processed.

  • @irodasay3448
    @irodasay3448 2 ปีที่แล้ว +1

    Thank you for the tutorial. I have a question about converting GSE to ExpressionSet. I used your vignette and tried to do the same for GSE181462. 1th I got GSE by : gse

  • @1980yadalam
    @1980yadalam 2 ปีที่แล้ว

    very good video, thanks.

  • @SamipSapkota-zg8hy
    @SamipSapkota-zg8hy 6 หลายเดือนก่อน

    the value of strain samples and cell.type becomes null

  • @killa14108
    @killa14108 2 ปีที่แล้ว

    Hi what happens when there are NAs in the gene expression data? The accession number is GSE70947 and it's a breast cancer data set with 296 total samples and 62976 features (genes). I followed what you did and queried the data directly using GEOquery from Bioconductor. I am just stuck now and figuring out how to deal with NAs and would appreciate your help.
    Thank you!

    • @Bioinformagician
      @Bioinformagician  2 ปีที่แล้ว +1

      I would quantify the NAs for each gene across all samples and filter out genes that have NAs in more than half of the samples.
      I usually prefer to replace NAs with 0.

    • @killa14108
      @killa14108 2 ปีที่แล้ว

      @@Bioinformagician Thank you very much! Do you also might have any recommended methods for feature (gene) selection for creating a classification model in predicting cancer/normal samples?

  • @sanjaisrao484
    @sanjaisrao484 2 ปีที่แล้ว

    Mam some doesn't have sample names in Geoquery metadata please help, I am stuck here

    • @Bioinformagician
      @Bioinformagician  2 ปีที่แล้ว

      Are you using the same dataset used in the video?

  • @sanjaisrao484
    @sanjaisrao484 2 ปีที่แล้ว +1

    Thanks

  • @QAKS1264
    @QAKS1264 2 ปีที่แล้ว +1

  • @kajalpanchal8239
    @kajalpanchal8239 2 ปีที่แล้ว

    thankya Khushbu!

  • @terryadams2652
    @terryadams2652 2 ปีที่แล้ว

    @Bioinformagician, I apologize for my question (please), but, as a Biologist, I am now learning Python. I really don't want to spend what little time I have learning another language (R). So, to get these results, is it possible to just use Python instead of R? Thank you very much, my dear.

    • @Bioinformagician
      @Bioinformagician  2 ปีที่แล้ว

      You can perform R equivalent operations in python. I believe it is pandas package in python that will allow you to do all your data wrangling.

  • @arcturusdig1673
    @arcturusdig1673 ปีที่แล้ว +6

    I can't understand most of the things you do. I need to go to other tutorial videos for understanding every single step. If you want your viewers to understand especially beginners, then please make your explanation more lucid and easy.

  • @gargiagravanshi355
    @gargiagravanshi355 7 หลายเดือนก่อน

    Hello ma’am ! I funckin need your help I’m stuck with a project and my mentor is very toxic please let me know how can I contact you.

    • @Bioinformagician
      @Bioinformagician  7 หลายเดือนก่อน

      My contact details can be found in the video description :)

  • @vahidgorganli8895
    @vahidgorganli8895 ปีที่แล้ว

    🙂👍

  • @muhammadrafiq7645
    @muhammadrafiq7645 3 ปีที่แล้ว +1

    great vedio can you please share your email indeed some help.

  • @hamadalbasri9058
    @hamadalbasri9058 ปีที่แล้ว

    great vedio but why not translate ?!