BLAST

Description

The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.

The BLAST home page is at http://blast.ncbi.nlm.nih.gov.

Availability

BLAST is installed on Mahuika.

Licence

BLAST is a "United States Government Work" in the Public Domain and so is freely available for use.

Example script

This script starts by copying the BLAST database being searched into $TMPDIR, which is a per-job temporary directory inside /tmp.  Since compute nodes do not have local disks, this is in memory, and so must be allowed for in the memory requested by the job.  This approach is thought to be the optimal one when you have a large amount of query sequence to get through, as it prevents having to re-read the database from disk over and over.  If you are searching a very small database (less than 10 GB) or searching with a small amount of query sequence (so that only one pass through the database will be required) then this extra complexity is not required.

#!/bin/bash -e

#SBATCH --job-name       BLAST
#SBATCH --time           02:30:00      # Allow 100 CPU hrs / GB of blastn query seq
#SBATCH --hint nomultithread # Unless trying 72 threads
#SBATCH --ntasks 1 #SBATCH --cpus-per-task 36 # 1 whole node. #SBATCH --mem 108G # 1 whole node. Allow for whole database.
##SBATCH --mem 200G # For large databases (nr, refseq_genomic)
##SBATCH --partition bigmem # For large databases (nr, refseq_genomic) module load BLAST/2.6.0-gimkl-2017a module load BLASTDB/2018-08 # This script takes one argument, the FASTA file of query sequences. QUERIES=$1 FORMAT="6 qseqid qstart qend qseq sseqid sgi sacc sstart send staxids sscinames stitle length evalue bitscore" BLASTOPTS="-evalue 0.05 -max_target_seqs 10" BLASTAPP=blastn DB=nt #BLASTAPP=blastx #DB=nr # Keep the database in RAM if searching multiple
# query sequences against a database of over 10GB. cp $BLASTDB/{$DB,taxdb}* $TMPDIR/ export BLASTDB=$TMPDIR # Single node multithreaded BLAST. srun $BLASTAPP $BLASTOPTS -db $DB -query $QUERIES -outfmt "$FORMAT" \ -out $QUERIES.$DB.$BLASTAPP -num_threads $SLURM_CPUS_PER_TASK

BLAST databases

We download the standard NCBI databases quarterly, and create a corresponding environment module named like BLASTDB/<yyyy-mm> which sets the BLASTDB environment variable accordingly. If you want to use one of these databases then you should find out what our most recent version is (module spider BLASTDB) and then load it in your batch script.

Because we only keep a few recent versions of the databases, you may be required from time to time to change the BLASTDB module version if you use old job submission scripts as templates for new ones.

Labels: mahuika tier1 biology
Was this article helpful?
0 out of 0 found this helpful