ADS1: Practical: Working with sequencing reads

แชร์
ฝัง
  • เผยแพร่เมื่อ 20 ต.ค. 2024

ความคิดเห็น • 16

  • @richardwalker5629
    @richardwalker5629 9 ปีที่แล้ว +5

    I'm liking and following this course, but could the genome links be made available? thx

    • @hirak123456
      @hirak123456 9 ปีที่แล้ว +7

      Richard Walker They are written in the jupyter notebooks. I guess they are public. github.com/BenLangmead/ads1-notebooks

  • @elohor_okpako
    @elohor_okpako 9 หลายเดือนก่อน

    Thank you for such a nice video. I have been improving my programming following the videos. I also tried using a different method to read the fastq file to extract the sequences and quality score.
    def readfastq (filename):
    with open (filename,'r') as f:
    file=f.readlines()
    seq=[file[i].strip('
    ') for i in range(1,len(file),4)]
    qual=[file[i].strip('
    ') for i in range (3, len(file),4)]
    return seq, qual

  • @murkovsky22
    @murkovsky22 10 หลายเดือนก่อน

    how powerful should be my machine to perform this tasks?

  • @BrandonSLockey
    @BrandonSLockey 4 ปีที่แล้ว +3

    10:10 i still don't understand why 2 is so dominant, like not even 1s or 3s?? just a huge amount of 2s.... like his explanation is not doing it for me.. does anyone know... (like maybe huge probabilty difference gap between 1-2 and 2-3, that they all get classified as "2", but if anyone knows any better, would be appreciated)

    • @lo8885
      @lo8885 2 ปีที่แล้ว

      Hey brandon , a quality score of 3 means there are 50% chance that the bases are incorrect .
      The difference between the Q = 3 and Q = 2 is that from 3 to 2 you have greater chances to have an incorrect set overall ( > 50%), and because these values are discrete , you can't end up with values in-between
      I suppose this, hope it's clear lol
      And also, a quality score of 1 would mean 100% of incorrect bases
      Lmk if you notice Im wrong , this interests me

    • @juanpablomorantorres1903
      @juanpablomorantorres1903 ปีที่แล้ว +1

      because they are plotting the quality frequencies per base and not per read. Chances of misincorporating non-terminating nucleotides increases over time, so bases with poor quality scores are present more often (maybe at the end of each good read)

  • @annu1327
    @annu1327 3 ปีที่แล้ว +2

    can someone provide me the url?

  • @tamer4456
    @tamer4456 5 ปีที่แล้ว

    Thank you.

  • @Fit_IITian_Madhav
    @Fit_IITian_Madhav 2 ปีที่แล้ว

    which software u are using there is no wget such stupid thing in python I haven't seen such programming env.

    • @jawswasnevermyscene4258
      @jawswasnevermyscene4258 4 หลายเดือนก่อน +1

      Just load the dataset from the url , or use conda to load wget