STRING enrichment analysis: Brief introduction to the gene set enrichment functionality of STRING

แชร์
ฝัง
  • เผยแพร่เมื่อ 25 ม.ค. 2025

ความคิดเห็น • 35

  • @CancerResearchDemystified
    @CancerResearchDemystified 3 ปีที่แล้ว +3

    Thank you for making this, it is really useful! I've always loved toying around with STRING, but sometimes I struggle to describe the utility of it to students! I'll be linking them to this video from now on, and we'll tweet it out now too :)

    • @larsjuhljensen
      @larsjuhljensen  3 ปีที่แล้ว

      Thank you! Always wonderful to hear that people find it useful :-)

  • @GeorgeTzotzos-co7bg
    @GeorgeTzotzos-co7bg 2 ปีที่แล้ว +1

    Thank you very much for the comprehensive and succinct tutorials. I have two questions related to enrichment analysis. (i) How reliable is enrichment analysis following network after network expansion, and (ii) There are two methods for network expansion. The first one is using Apps -> STRING -> expand network and the second by specifying a maximum number of additional interactors when creating the initial network. Which one is likely to result to more reliable enrichment analysis. Thank you for your attention.

    • @larsjuhljensen
      @larsjuhljensen  2 ปีที่แล้ว

      Great questions! When it comes to doing enrichment analysis on expanded networks, enrichment should be expected, since some of the proteins in the network were added specifically because they are associated with the others. This is why STRING shows the warning "Note: some enrichments may be expected here" in that situation.
      The second question, which relates to Cytoscape stringApp, is really two separate issues. Firstly, specifying "maximum number of interactors" when querying will execute the exact same code as "expand network" does. The only difference is that the latter allows you to specify the selectivity parameter, whereas the former has it set to 0 to mimic what STRING does. Regardless of which you do, the logic above still applies: if you are pulling in proteins based on their functional associations, enrichment should be expected.

  • @ayahasan7330
    @ayahasan7330 ปีที่แล้ว +1

    Thank you for making this I would like to ask you a question, can I use the String database for gene-gene interaction maps based on fold change?
    From: proteins with value---Insert the gens with a fold change.

    • @larsjuhljensen
      @larsjuhljensen  ปีที่แล้ว +1

      I'm not completely sure what you want to do. If you want to show a network with from STRING with your own fold-change values on it, I suggest you start with having a look at this tutorial: th-cam.com/video/kRQyPDMF_8k/w-d-xo.html

    • @ayahasan7330
      @ayahasan7330 ปีที่แล้ว

      @@larsjuhljensen I want to draw genes network based on fold change

  • @washingtonwuyi9534
    @washingtonwuyi9534 11 หลายเดือนก่อน +1

    thank you, you're a good man.😘

    • @larsjuhljensen
      @larsjuhljensen  6 หลายเดือนก่อน

      You're welcome, I hope it's useful for people.

  • @adlay2435
    @adlay2435 3 ปีที่แล้ว +2

    Hello! Very informative video. I would like to ask, are those Gene Ontology works similarly with the Gene Ontology in DAVID database? Can I use STRING instead of DAVID for GO and KEGG enrichment analysis? Thank you.

    • @larsjuhljensen
      @larsjuhljensen  3 ปีที่แล้ว

      Yes, it is much the same as what DAVID and many other methods do. So you can certainly use it as an alternative, but you should expect results to differ between tools (but they should give mostly similar results).

  • @alriho09
    @alriho09 ปีที่แล้ว +1

    Hi, thanks for the video. I have a question, does it make sense to use both GSEA and STRING to analyse different enriched pathways, given that you are using the same gene set?

    • @larsjuhljensen
      @larsjuhljensen  ปีที่แล้ว +1

      It is a good question - there can certainly be reasons for using multiple tools, e.g. use STRING for network analysis and something else than STRING for enrichment analysis. For example, the other tools may have some functionality or different visualization that you need or want. That being said, I think it is important to know why you choose to use what you use. I have the impression that some people run multiple enrichment tools on their dataset and on a case-by-case basis choose the one that gives the results they like, which is very problematic from a statistical standpoint.

    • @alriho09
      @alriho09 ปีที่แล้ว

      Thank you for your answer @@larsjuhljensen

  • @mamoketebokhale6283
    @mamoketebokhale6283 3 ปีที่แล้ว +1

    Hello, great video. Thank you. I would like to ask what does it mean when it says there is no enrinchment data for this species on the STRING database?

    • @larsjuhljensen
      @larsjuhljensen  3 ปีที่แล้ว

      My guess is that you are looking at one of the new so-called "mapped" species in STRING 11.5. This is a new feature, which allows us to include many more species than in prior versions; however, these are created by simply mapping the networks from other species based on orthology. This means that for these species, we are not importing the full range of data that we do for other species, which is why a feature like enrichment analysis is not available for mapped species.

    • @mamoketebokhale6283
      @mamoketebokhale6283 3 ปีที่แล้ว +1

      @@larsjuhljensen Thank you so much for the explaination, I was so confused becuase i saw on your video the part about enrichment but it was not appearing for my species by the way I am working with Pectobacterium brasiliense.

    • @larsjuhljensen
      @larsjuhljensen  3 ปีที่แล้ว

      @@mamoketebokhale6283 Indeed, I just checked: Pectobacterium brasiliense is one of the new "mapped" species.

  • @tamararojas3011
    @tamararojas3011 3 ปีที่แล้ว +1

    Thank you for this video! I would like to ask what method of grouping is used. I saw that it uses k-means or MCL grouping, but I can't choose this on the page. Which one use by default?

    • @larsjuhljensen
      @larsjuhljensen  3 ปีที่แล้ว

      If you're talking about the enrichment for STRING subnetworks, these come from a hierarchical clustering of the global STRING network. It is a somewhat different task than when clustering a network for visualization purposes. When visualizing, you want to use a clustering algorithm that divides your network at hand into a concrete set of subnetworks. For enrichment analysis, you want to take the full STRING network and create a hierarchy of both broad and fine-grained clusters (just like you have broad and fine-grained GO terms) and then let the enrichment analysis pick from the clusters at all granularities.

    • @tamararojas3011
      @tamararojas3011 3 ปีที่แล้ว +1

      @@larsjuhljensen thanks for your answer! I mean the "avg. local clustering coefficient" given in the "analysis" section, is it the same?

    • @larsjuhljensen
      @larsjuhljensen  3 ปีที่แล้ว

      @@tamararojas3011 Oh that is what you're talking about - no this is something else. Despite the name, it is not something comes out of running a clustering. Each node in a network has a clustering coefficient, which is a number that tells you to which extent its neighbors are linked to each other. The average clustering coefficient is the average of this across all the nodes in your network. For more details see Wikipedia: en.wikipedia.org/wiki/Clustering_coefficient

    • @tamararojas3011
      @tamararojas3011 3 ปีที่แล้ว +1

      @@larsjuhljensen Thank you so much! :D

  • @manuelbezerra2425
    @manuelbezerra2425 3 ปีที่แล้ว +1

    which statistical test is used in functional analysis?

    • @larsjuhljensen
      @larsjuhljensen  3 ปีที่แล้ว

      Do you mean for the set-based approach or for the ranked-list approach?

  • @mamoketebokhale6283
    @mamoketebokhale6283 3 ปีที่แล้ว +1

    I also noticed that the database has no molecular interactions box for me to tick.

    • @larsjuhljensen
      @larsjuhljensen  3 ปีที่แล้ว

      I'm not sure what you are talking about here. Could you elaborate a bit on what you're looking for?

    • @mamoketebokhale6283
      @mamoketebokhale6283 3 ปีที่แล้ว +1

      @@larsjuhljensen hello, where there is network type, i saw that the database doesnt have the molecular interactions box, i think if that is clicked and then molecular interactions are shown on the network

    • @larsjuhljensen
      @larsjuhljensen  3 ปีที่แล้ว

      @@mamoketebokhale6283 You might be talking about the old "actions" mode, which we have discontinued. It has been replaced by the physical network function, which you can find in the Settings tab. If you choose that you'll only see physical protein interactions, not all functional associations.

    • @mamoketebokhale6283
      @mamoketebokhale6283 3 ปีที่แล้ว

      @@larsjuhljensen Thank you so much for the clarification.

  • @nikadaskova2936
    @nikadaskova2936 2 ปีที่แล้ว +1

    Dear sir, thank you so much for a very informative video! I have a question though - when using basic list enrichment with a list of proteins, how is calculated the "strength" of each pathway and also how is calculated False Discovery Rate in this case? Thank you! :)

    • @larsjuhljensen
      @larsjuhljensen  2 ปีที่แล้ว +1

      There is a small "explain columns" link just above the enrichment tables. It explains how both the strength and FDR are calculated :-)

  • @eason02
    @eason02 3 ปีที่แล้ว +1

    Great video as usual!I have a question would like to ask. Regarding the pathway enrichment analysis for eg. Mostly the top of the pathway are pathways that are common such as cancer pathway etc. I would like to ask how we can select the pathways that are relevant to our disease of interest according to the analysis. Let's say Alzheimer disease.

    • @larsjuhljensen
      @larsjuhljensen  3 ปีที่แล้ว +1

      Thanks, and good question, I know exactly what you mean. Unfortunately, there is not really a way to do that. Since the pathway databases are not annotated with information about when a pathway would and wouldn't be relevant (and I do not really see how one would do that), there is no way to automatically filter the results to only show pathways relevant to, e.g., Alzheimer's disease. What you can do is to focus more on the enrichment score (i.e. on how enriched a term is) over the p-value (i.e. how significantly enriched a term is). Since higher counts inherently means better statistical power, focusing on p-values tends to favor big fairly unspecific pathways/terms over small more specific ones, which may be much more strongly enriched, yet have lower significance simply because of being smaller.

    • @eason02
      @eason02 3 ปีที่แล้ว

      @@larsjuhljensen I get a more clearer picture now.. Thanks for the explanation!! Really appreciated for all your video which helps a lot!!