From my understanding if your nodes are too unique then you won't get a very connected graph. I think this blog post is helpful about choosing kmer sizes: homolog.us/blogs/blog/2012/10/10/multi-kmer-de-bruijn-graphs/
You cannot have a k-mer of size larger than the read size because the idea of k-mer is based on dividing the reads into a smaller subset of k-nucleotides so that the complexity of having an undetermined number of reads is eliminated as compared to having a fixed number of k-mers. That being said, having a k-mer of size greater than the read size can cause an increase in the length of contiguous sequence than its actual length in the genome.
Great explanation. Thanks.
This was really useful, thank you!
Why would you choose a k-mer smaller than your read length? Wouldn't you want a longer k-mer to avoid the problems you were talking about?
From my understanding if your nodes are too unique then you won't get a very connected graph. I think this blog post is helpful about choosing kmer sizes: homolog.us/blogs/blog/2012/10/10/multi-kmer-de-bruijn-graphs/
You cannot have a k-mer of size larger than the read size because the idea of k-mer is based on dividing the reads into a smaller subset of k-nucleotides so that the complexity of having an undetermined number of reads is eliminated as compared to having a fixed number of k-mers. That being said, having a k-mer of size greater than the read size can cause an increase in the length of contiguous sequence than its actual length in the genome.