How to perform local blast on computer? Complete tutorial installation to run.

แชร์
ฝัง
  • เผยแพร่เมื่อ 28 ต.ค. 2024

ความคิดเห็น • 123

  • @lsnlst1725
    @lsnlst1725 2 ปีที่แล้ว

    So far, the best video on stand alone blast

  • @TejasMaheshKaleBBB
    @TejasMaheshKaleBBB 3 ปีที่แล้ว +2

    for making database of nucleotide , we need to copy that nucleotide file in fasta format in bin , right ? I have 50 nucleotide fasta files of different strains of one bacteria I copied all of them to bin folder and tried making database of one strain using the mentioned code. But it shows blast error - that file does not exist ? How to solve this issue ? thank you.

  • @rosepearlart
    @rosepearlart 4 ปีที่แล้ว

    Hi i followed your command line to create my database and it is giving me error that there is no space on the disk. Kindly help me
    C:\Program Files\NCBI\blast-2.10.1+\bin>makeblastdb -in degaa-p.fasta -dbtype prot -out db
    Building a new DB, current time: 07/18/2020 02:14:32
    New DB name: C:\Program Files\NCBI\blast-2.10.1+\bin\db
    New DB title: degaa-p.fasta
    Sequence type: Protein
    Keep MBits: T
    Maximum file size: 1000000000B
    No volumes were created.
    Error: mdb_env_open: There is not enough space on the disk.

    • @XploreBio
      @XploreBio  4 ปีที่แล้ว

      Check few things-
      1. run the command with administrator rights.
      2. There should be enough sace based on database size. you may first try with smaller data to make a database.
      3. install latest blast

    • @rosepearlart
      @rosepearlart 4 ปีที่แล้ว

      @@XploreBio I installed latest blast and the degaa.fasta has around 68000 sequences. With every type of data it is giving me the same error. And I run the command prompt with administrator rights.

    • @vasavigarisetti3064
      @vasavigarisetti3064 4 ปีที่แล้ว

      @@rosepearlart www.biostars.org/p/413294/ check this out..solves the problem

  • @Drpokok90
    @Drpokok90 5 หลายเดือนก่อน

    Hello. Could you explain in the output file, what are the headings for each column?

  • @Asteretsa
    @Asteretsa 5 ปีที่แล้ว

    Thank you, this was very useful. If you run makeblastdb for a nucleotide database, and you have a .fasta file, do you need to add taxonomy with a different command or is it included in the .fasta file?

    • @XploreBio
      @XploreBio  5 ปีที่แล้ว

      this command just provide you the best hits, id, sequence, similarity details and score. for taxonomy you need to try some other things.

  • @mayraa9876
    @mayraa9876 2 ปีที่แล้ว +1

    Thank you so much!!

  • @viniciusrasvailer636
    @viniciusrasvailer636 ปีที่แล้ว

    Thank you for this video! Can I create my own database (using differents databases from NCBI's website)? and query multiple sequences and the output be just the highest score result?

    • @XploreBio
      @XploreBio  ปีที่แล้ว

      Ofcourse you can create your own database using sequences from any source or organism and search your query against it. Highest score would be the best hit.

    • @viniciusrasvailer636
      @viniciusrasvailer636 ปีที่แล้ว

      @@XploreBio Thank you!! Do you have an e-mail to I write some questions? It's really difficult to me explain in one comment

    • @XploreBio
      @XploreBio  ปีที่แล้ว

      Write to me at xplorebio@yahoo.com

  • @oroojojo777
    @oroojojo777 5 ปีที่แล้ว

    You deserve more followers ! Keep it up !

    • @XploreBio
      @XploreBio  5 ปีที่แล้ว

      Thanks and with your wishes, my channel will grow much more..

  • @purplerain365
    @purplerain365 2 ปีที่แล้ว

    Hi, I followed your method to run blastp on my sequence, but I keep get error and stated segmentation fault : core dumped , the output file is empty.
    I have no idea what is the problem.....

  • @bikidas5029
    @bikidas5029 ปีที่แล้ว +1

    what should be the db type for nucleotide sequence

  • @praveshbhargav7946
    @praveshbhargav7946 2 ปีที่แล้ว

    CFastaReader: Hyphens are invalid and will be ignored around line 12 ??

  • @anujsharma6220
    @anujsharma6220 2 ปีที่แล้ว

    hi,
    what to do if the "application is unable to open" error comes after the "blastx -h" command is run ?
    this error has been coming up by every installation protocol.

  • @dhananjaimp9036
    @dhananjaimp9036 2 ปีที่แล้ว +1

    Great job

  • @fevXR-ut1ce
    @fevXR-ut1ce 8 หลายเดือนก่อน

    Great video! Very informative! Where do I find the db for human transcriptome compatible with the following script? I tried a few online resources from NCBI, but I kept getting the output below. Many thanks!
    C:\Program Files\NCBI\blast-2.15.0+\bin>makeblastdb -in GRCh38_latest_rna.fna -dbtype nucl -out DB
    Building a new DB, current time: 02/26/2024 16:05:11
    New DB name: C:\Program Files\NCBI\blast-2.15.0+\bin\DB
    New DB title: GRCh38_latest_rna.fna
    Sequence type: Nucleotide
    Keep MBits: T
    Maximum file size: 3000000000B
    Adding sequences from FASTA; added 185121 sequences in 7.22054 seconds.
    BLAST Database error: No alias or index file found for nucleotide database [C:\Program] in search path [C:\Program Files\NCBI\blast-2.15.0+\bin;;]

    • @XploreBio
      @XploreBio  8 หลายเดือนก่อน

      It must be because you do not have that database file in that folder or you have not typed its extension along with its name. Try to keep all files in the bin folder

  • @kirankirdat1338
    @kirankirdat1338 2 ปีที่แล้ว

    thank you. it was very useful

  • @norulwfms8064
    @norulwfms8064 3 ปีที่แล้ว

    Hello. Can I ask where did you get the query.txt file? For the DB.fa, I already understand how to get it. Thank you

    • @XploreBio
      @XploreBio  3 ปีที่แล้ว

      you can make your own query.txt file that contain sequences in fasta format. for example it may contain sequences from transcriptome assembly or particular set you mined from ncbi.

  • @danielleg2179
    @danielleg2179 4 ปีที่แล้ว +1

    Hello! I put in exactly the same command (except for the input file name of course) when making the protein database (makeblast db -in input_file.fa -dbtype prot -out db), but I keep getting an Error: mdb_env_open: Input/output error
    Can you please tell me, what could be the problem? Thank you

    • @XploreBio
      @XploreBio  4 ปีที่แล้ว

      Not sure about the reason. Must try following: Check extention file detail. input file format should be same as you type in command.
      1. check is disk has enough space.
      2. have you run command prompt with administrator rights
      3. your installation drive should not be right protected.
      4. check if file is in fasta format.
      5. At last try reinstalling latest version of blast

    • @jenniferambrose4600
      @jenniferambrose4600 3 ปีที่แล้ว

      I tried setting up a custom protein db to extract non-homologous (human) from a bacterial proteome. Executed the same command that XploreBio has given but I ended up getting the same error "Error: mdb_env_open: Input/output error". C: drive has more than 50% space empty so that shouldnt be a problem; ran the command from admin only; files are saved in .FASTA format. Standalone blast version installed is Blast 2.11.0+. XploreBio, please help me fix it. Thank you

    • @TejasMaheshKaleBBB
      @TejasMaheshKaleBBB 3 ปีที่แล้ว

      @@XploreBio Ihave 50 nucleotidefasta files dataabse is created in other directory but in in bin folder it is giving the above error

  • @jyothipvs828
    @jyothipvs828 3 ปีที่แล้ว

    Hi,
    I don't understand how u got .fa and query text file.Are they the same?
    In some other runs ,they have downloaded the nr.* database...So what exactly is happening?
    I am not from bio background but I need this for a different experiment.Please help.

    • @XploreBio
      @XploreBio  3 ปีที่แล้ว +1

      For blast run you need two files. One is ur query sequence file in fasta format. Check what is fasta format. Second you need a database file. It can be nr database or any other custom sequences. File extention can be .txt or .fa or .fasta but sequence format must be fasta format within a file. If any problem feel free to comment.

  • @studyessentials6308
    @studyessentials6308 2 ปีที่แล้ว

    it is showing 'no volume were create mdb_env_open inputoutput error

  • @anubhav2198
    @anubhav2198 4 ปีที่แล้ว

    If my sequence type is DNA and not protein, what should the flag be changed from prot to ? Thanks

  • @paulinaz4435
    @paulinaz4435 4 ปีที่แล้ว

    Hello! I need to blast two proteomes to get homologous proteins. Both of the files are in a FASTA format. I tried to make one of them the database by simply using the makeblastdb command and then putting in the name of my file in fasta format in the place you put the DB.fa. I get error 'too many positional arguments' do you know why this is?

    • @XploreBio
      @XploreBio  4 ปีที่แล้ว

      send the exact script and name of the files.

    • @paulinaz4435
      @paulinaz4435 4 ปีที่แล้ว

      @@XploreBio So I have one proteome, lets say 'sampleone' and another one called 'uniprotproteomeUP000002446'. They're both fasta files from uniprot. I tried:
      makeblastdb -in uniprotproteomeUP000002446.fasta -dbtype prot -out uniprotproteomeUP000002446
      but i get Error: mdb_env_open: Input/output error.
      I also tried just putting in the file name in blastp command without making it the database:
      blastp -query sampleone.fasta -db uniprotproteomeUP000002446 -out output.rtf -outfmt 6 -evalue 0.00001 -max_target_seqs 1
      and I get: BLAST Database error: No alias or index file found for protein database [uniprotproteomeUP000002446] in search path [C:\.....)
      I have both of those files in the bin folder.

    • @XploreBio
      @XploreBio  4 ปีที่แล้ว +1

      Make sure both the file are in the bin folder of NCBI. Make sure both files are fasta formatted protein sequences not nucleotide. If this is ok then check if the extension of both files are .fasta. Refer google to see extention of a file if not visible. Next, You must create a protein database with one of the file.
      Script:
      makeblastdb -in db.fasta -dbtype prot -out db
      In the above script "db" is your first protein file.
      Do the blastp using following script:
      blastp -query query.fasta -db db -out output.txt -outfmt "6 qseqid qlen sseqid salltitles pident mismatch gapopen qstart qend qcovs sstart send evalue bitscore" -evalue 0.00001 -max_target_seqs 5 -num_threads 4
      Hope this solves your problem!

    • @paulinaz4435
      @paulinaz4435 4 ปีที่แล้ว

      @@XploreBio That has helped a lot, thank you. Do you know if I there's a way to align multiple proteome files at the same time to find a core proteome?

    • @XploreBio
      @XploreBio  4 ปีที่แล้ว

      @@paulinaz4435 make a combined database of multiple proteome. set parameters to get more hita.then consider best hits of each proteome.

  • @jahidulislamrakib5953
    @jahidulislamrakib5953 ปีที่แล้ว

    how can u create or download db.fa and query.txt .this is the one of the main part for run this sofware.pls inform me.

    • @XploreBio
      @XploreBio  ปีที่แล้ว +1

      Query file contains fasta formated sequences for which you want to perform blast. Database file is another such file which may be a set of larger protein or DNA sequences that to in a fasta format. How to make a fasta file is described in my another video.

  • @mahangift4015
    @mahangift4015 3 ปีที่แล้ว

    Hello there, I'd made blast database from a .fasta file and have some files with this extensions: .ndb .nhr .nin .not nsq .ntf .nto. now I want to run the query nucleotide-nucleotide blast (number 5).
    question 1- where should I find the query.fa file?
    question 2- would you please explain the script number 5's parameters?

    • @mahangift4015
      @mahangift4015 3 ปีที่แล้ว

      I guessed that I should run command like this:
      blastn -query db.fasta -db db -output.txt -outfmt "6 qseqid qlen sseqid salltitles pident mismatch gapopen qstart qend qcovs sstart send evalue bitscore" -evalue 0.00001 -max_target_seqs 5 -num_threads 4
      in which db.fasta is my sequence file. then Ran the command and faced this 2 errors:
      Error: Unknown argument: "output"
      Error: (CArgException::eInvalidArg) Unknown argument: "output"

    • @XploreBio
      @XploreBio  3 ปีที่แล้ว

      @Mahangift
      query.fasta is a file you make by yourself. It has one or more sequences in fasta format.
      db.fa is the database in which you will blast your query.
      Script should be:
      blastn -query query. fasta -db db -output output.txt -outfmt "6 qseqid qlen sseqid salltitles pident mismatch gapopen qstart qend qcovs sstart send evalue bitscore" -evalue 0.00001 -max_target_seqs 5 -num_threads 4.
      Take care that name and extension of files should be same.

    • @mahangift4015
      @mahangift4015 3 ปีที่แล้ว

      @@XploreBio Thank you so much. Where can I find a tutorial about blastn -options with examples? The NCBI website tutorial is not applicable.

  • @trilisser
    @trilisser 3 ปีที่แล้ว

    hello! How can I download the nt database from ncbi ftp? I mean, I see many nt files at ftp (nt.00, 01, 02...) so should I download them all?

    • @XploreBio
      @XploreBio  3 ปีที่แล้ว

      @Artem. Database files larger than 2GB are split into multiple files and all needs to be downloaded.

  • @ziyazorluer3195
    @ziyazorluer3195 3 ปีที่แล้ว

    hello!
    thanks for your video and answers, they were very helpful to me. I made many protein blast like you did. But now i want to do parellel blast for the sake of time, and cannot understand explanations on forum sites. How can i write right script for parellel blast?

    • @XploreBio
      @XploreBio  3 ปีที่แล้ว

      You have different queries or different databases? How big in the sense of number of sequences is your data

    • @ziyazorluer3195
      @ziyazorluer3195 3 ปีที่แล้ว

      @@XploreBio i have different queries, i am doing whole genome blast for many organisms.

    • @XploreBio
      @XploreBio  3 ปีที่แล้ว +1

      @@ziyazorluer3195 You should not run parallel blast. It may even slow down the process. You can merge all queries to a single file (as they will have different IDs). Or you can do a batch run combining all scripts to a single file. So you don't have to give command again and again.

    • @ziyazorluer3195
      @ziyazorluer3195 3 ปีที่แล้ว

      @@XploreBio thank you very much

  • @mrlightningbolt887
    @mrlightningbolt887 ปีที่แล้ว

    What is the advantage of doing this rather than using the servers? Is it faster?

    • @XploreBio
      @XploreBio  ปีที่แล้ว

      Definitely it is. You can compare it too.

    • @mrlightningbolt887
      @mrlightningbolt887 ปีที่แล้ว

      @@XploreBio I’m new to BLAST and I started a job with a Qlength of almost 500,000 on blastn. How long would that be projected to take using their servers? If it’s unreasonable I should probably do it locally then

    • @XploreBio
      @XploreBio  ปีที่แล้ว

      Well that would not be possible through online servers. They allow only a few sequences to be searched at a time. I would highly recommend to do a local blast.

    • @mrlightningbolt887
      @mrlightningbolt887 ปีที่แล้ว

      @@XploreBio Understood. Thank you for your help, as well as your wonderful guide.

  • @mariaazhar1468
    @mariaazhar1468 5 ปีที่แล้ว +1

    can i use same command to blast two proteomes

    • @XploreBio
      @XploreBio  5 ปีที่แล้ว +1

      U can use the command to do search homology between two protein datasets. One will be your query and other will be the protein database you create using makeblastdb. Thanks

  • @scivam6611
    @scivam6611 3 ปีที่แล้ว

    I'm getting argument "query" . File is not accessible please help

    • @XploreBio
      @XploreBio  3 ปีที่แล้ว

      follow the video as it is. all files should be in bin folder. while running locate the bin and run as administrator. give the file names with extention.

  • @praveshbhargav7946
    @praveshbhargav7946 2 ปีที่แล้ว

    Hello, How to solve this problem-BLAST Database error: No alias or index file found for protein database [C:\Program] in search path [C:\Program Files\NCBI\blast-BLAST_VERSION+\bin;C:\NCBI\blast-BLAST_VERSION+\db;]

    • @XploreBio
      @XploreBio  2 ปีที่แล้ว

      All files need to in the bin folder of ncbi. Makesure name and their extension are right.

    • @praveshbhargav7946
      @praveshbhargav7946 2 ปีที่แล้ว

      @@XploreBio Ok sir ,I will try. Thank you

    • @kenosikebabonye9664
      @kenosikebabonye9664 ปีที่แล้ว

      @@praveshbhargav7946 did u get the solution to that. having the same problem

  • @muhammadadnansabar7548
    @muhammadadnansabar7548 3 ปีที่แล้ว

    Hi. Thanks for your videos. It’s helping around globe. I want to do blastn from metagenomic data to MGEs ACLAME database. I already trimmed adaptors and got fasta file and i download ACLAME database fasta file. I’m trying to make db of ACLAME fasta file but after command every time i got error “that file doesn’t exist”. Can you please tell me what’s the issue?

    • @XploreBio
      @XploreBio  3 ปีที่แล้ว

      Follow everything as it is.
      Send me the exact command and file names with extention you used. It must work.

    • @muhammadadnansabar7548
      @muhammadadnansabar7548 3 ปีที่แล้ว

      @@XploreBio the aclame fasta file is labeled as mydb.fasta and I have put the file into a bin folder of blast in the path C:\Program Files\NCBI\blast-2.12.0+\bin
      Ive been typing this into the command line:
      cd C:\Program Files\NCBI\blast-2.12.0+\bin to enter in directory
      makeblastdb -in mydb.fasta -dbtype nucl -out DB
      Building a new DB, current time: 09/23/2021 13:48:15 New DB name: C:\Program Files\NCBI\blast-2.12.0+\bin\DB New DB title: mydb.fasta Sequence type: Nucleotide Keep MBits: T Maximum file size: 1000000000B BLAST options error: File mydb.fasta does not exist
      This error i got every time.

    • @XploreBio
      @XploreBio  3 ปีที่แล้ว

      Try to copy few nucleotide sequences in fasta format in notepad. Save the file as .fasta and then try to run this file in your command. Run cmd prompt in admin mode

    • @muhammadadnansabar7548
      @muhammadadnansabar7548 3 ปีที่แล้ว

      @@XploreBio i did but error is still same “file does not exist”

    • @XploreBio
      @XploreBio  3 ปีที่แล้ว

      @@muhammadadnansabar7548 Just check if there are no double extension of the file. For example mydb.fasta.txt. There can only be 2 possibilities either with the file extention or the working directory.

  • @joshperry1239
    @joshperry1239 4 ปีที่แล้ว

    what do i do if i have a folder of FASTA protein sequences i want to use to create a database

    • @XploreBio
      @XploreBio  4 ปีที่แล้ว

      copy all the sequences in single fasta formatted file and then you can use it.

    • @joshperry1239
      @joshperry1239 4 ปีที่แล้ว

      @@XploreBio Each fasta file in the folder I have contains the proteome of a different species. If i want to find the best match for a query for each species, will that ability be affected if the Fasta files for all the species are combined?

    • @XploreBio
      @XploreBio  4 ปีที่แล้ว

      yes. if you copy all sepcies data in one database, only best hit species will be returned. if you have 4-5 diff species data. i suggest to run blast separately on each database.

    • @joshperry1239
      @joshperry1239 4 ปีที่แล้ว

      @@XploreBio If i have 10 queries and about 30 species databases, is there an easier way to get the best match for each species based on each query instead of blasting each query against a species database one by one?

    • @XploreBio
      @XploreBio  4 ปีที่แล้ว

      you can combine all species seq. in one to make a database. then set max out put hits to 50 or so. this way may get multiple hits and then choose for each species.
      Else you can use query to make 10 databases (the other way round)

  • @mariaazhar1468
    @mariaazhar1468 5 ปีที่แล้ว +1

    what do u mean by max_target_seqs

    • @XploreBio
      @XploreBio  5 ปีที่แล้ว

      It restricts the best blast hit to 1. Means you get 1 best result for your query.

  • @mariaazhar1468
    @mariaazhar1468 5 ปีที่แล้ว

    Actually i want to do blast to discard homologous proteins.
    Can u please elaborate the results of this video. How can i eliminate homologous proteins from my query proteome on the base of evalue and bit score.

    • @XploreBio
      @XploreBio  5 ปีที่แล้ว

      Do you mean you want to do blast and keep only those queries which have no homology in the protein database. I.e., you need sequences with no function?

    • @mariaazhar1468
      @mariaazhar1468 5 ปีที่แล้ว

      @@XploreBio yes i want non homologous proteins

    • @mariaazhar1468
      @mariaazhar1468 5 ปีที่แล้ว

      @@XploreBio you are giving evalue threshhold 0.001 in command what is its effect on the results?

    • @mariaazhar1468
      @mariaazhar1468 5 ปีที่แล้ว

      @@XploreBio how can i use bit score , identity score in my query?

    • @XploreBio
      @XploreBio  5 ปีที่แล้ว

      @@mariaazhar1468 Higher bit score and % identity reflects a better match. E value should be lower. There is no strict criteria for these. Usually what researchers need is e value. It should be atleast 0.00001. For more confidence you may also opt for 10 power -10.

  • @priyankamowlali315
    @priyankamowlali315 4 ปีที่แล้ว

    Can you please tell me how to translate a nucleotide fasta file to protein database using BLAST stand alone version?

    • @XploreBio
      @XploreBio  4 ปีที่แล้ว +1

      BLAST is sequence similarity search tool. You can perform blastx for searching nucleotide query with protein database. The sorftware has inbuilt commands for it, you do not need to transtale yourself. But yes there are other tools for translating nucleotide to protein.

    • @priyankamowlali315
      @priyankamowlali315 4 ปีที่แล้ว

      @@XploreBio thank you so much!
      I tried this command line:
      tblastn -query smo.txt -db GGGTP -out output.txt -outfmt 6 -evalue 0.01 -max_target_seqs 1
      but its showing error: Command line argument error: Argument "out". File is not accessible: `output.txt'.
      I do not understand what is wrong with the command line.

    • @XploreBio
      @XploreBio  4 ปีที่แล้ว +1

      Hi,
      First make sure you are running in administrator mode.
      Next for tblastn you must have protein sequnces as query file.
      Also, the database you create using makeblastdb command will be a nucleotide type.
      Try and tell its working or not.

    • @priyankamowlali315
      @priyankamowlali315 4 ปีที่แล้ว

      @@XploreBio thank you! it worked :)

    • @priyankamowlali315
      @priyankamowlali315 4 ปีที่แล้ว

      @@XploreBio one more query..
      I have around 800 protein sequences in the fasta format, how can i run all the queries together against a genome data?

  • @ranapankaj8604
    @ranapankaj8604 3 ปีที่แล้ว

    Make video on Cis regulating elements identification

  • @steynop1893
    @steynop1893 ปีที่แล้ว

    what is in the query file?

    • @XploreBio
      @XploreBio  ปีที่แล้ว

      It has the sequence for which you need to find the function in fasta format.

    • @steynop1893
      @steynop1893 ปีที่แล้ว +1

      @@XploreBio thanks alot bro

  • @prashanthjavali4199
    @prashanthjavali4199 4 ปีที่แล้ว

    How to download that DB.fa file?

    • @XploreBio
      @XploreBio  4 ปีที่แล้ว

      it can be any sequence file against which you search/blast your query. Database file can be protein or nucleotide of a species which can be downloaded from NCBI, Ensembl.