Follow

BLAST

Description

The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.

The BLAST home page is at http://blast.ncbi.nlm.nih.gov.

Available modules

Packages with modules

Module NeSI Cluster
BLAST/2.6.0-gimkl-2017a pan
BLAST/2.2.22-Linux_x86_64 pan
BLAST/2.2.26-Linux_x86_64 pan
BLAST/2.2.29-ictce-5.4.0 pan
BLAST/2.2.30-goolf-1.5.14 pan
BLAST/2.2.31-intel-2015a pan
BLAST/2.3.0-intel-2015a pan
BLAST/2.5.0-foss-2015a pan

Licence

BLAST is a "United States Government Work" in the Public Domain and so is freely available for use.

Example scripts

Example script for the Pan cluster

#!/bin/bash -e

#SBATCH --job-name       BLAST
#SBATCH --account        nesi99999
#SBATCH --time           02:30:00      # Allow 100 CPU hrs / GB of blastn query sequences.
#SBATCH --cpus-per-task  24            # Or 16 cpus is OK for databases smaller than 64GB.
#SBATCH --mem-per-cpu    6G            # Include allowance for RAM disk containing database.
#SBATCH --constraint     avx           # Avoid the slow "bigmem" machines.
#SBATCH --output         BLAST.%j.out  # Include the job ID in the names of the
#SBATCH --error          BLAST.%j.err  # Output and error files

module load BLAST/2.3.0-intel-2015a
module load BLASTDB/2016-02

# This script takes one argument, the FASTA file of query sequences.
QUERIES=$1
FORMAT="6 qseqid qstart qend qseq sseqid sgi sacc sstart send staxids sscinames stitle length evalue bitscore"
BLASTOPTS="-evalue 0.05 -max_target_seqs 10"
BLASTAPP=blastn
DB=nt
#BLASTAPP=blastx
#DB=nr

# Keep the database in RAM, which is important for databases over 4GB in the
# shared GPFS filesystem.
cp $BLASTDB/{$DB,taxdb}* $SHM_DIR/ 
export BLASTDB=$SHM_DIR

# Single node multithreaded BLAST.
srun $BLASTAPP $BLASTOPTS -db $DB -query $QUERIES -outfmt "$FORMAT" \
    -out $QUERIES.$DB.$BLASTAPP -num_threads $SLURM_CPUS_PER_TASK

Further notes

BLAST databases

We download the standard NCBI databases monthly, and create a corresponding environment module named like BLASTDB/<yyyy-mm> which sets the BLASTDB environment variable accordingly. If you are searching one of these databases then we recommend that you load the most recent of these modules in your job submission script, after the BLAST module.

Because we only keep a few recent versions of the databases, you may be required from time to time to change the BLASTDB module version if you use old job submission scripts as templates for new ones.

For backwards compatibility, the BLAST modules also set BLASTDB, to the most recent version which was present at the most recent planned outage. This default BLASTDB setting is deprecated and we encourage users to instead use the BLASTDB modules.

Comments

Powered by Zendesk