For Wednesday:

For Friday:

Slides for today on types of selection and how to infer it.

PSI BLAST example

PSI-blast provides an enormous advantage over normal blast in the detection of distantly related sequences.  It only works, if some closely related sequences are already available, but if this is the case it finds a lot of other distantly related sequences. 

The NCBI page describes PSI blast as follows:
Position-Specific Iterated BLAST (PSI-BLAST) provides an automated, easy-to-use version of a "profile" search, which is a sensitive way to look for sequence homologues. The program first performs a gapped BLAST database search. The PSI-BLAST program uses the information from any significant alignments returned to construct a position-specific score matrix, which replaces the query sequence for the next round of database searching. PSI-BLAST may be iterated until no new significant alignments are found. At this time PSI-BLAST may be used only for comparing protein queries with protein databases. 

A diagram giving an overview on the PSI-blast procedure is here.

The results of a normal blast search are aligned and a pattern of conserved residues is extracted from the alignment.  This pattern (the Position Specific Scoring Matrix) is used as query for the next iteration.  An important parameter to adjust is the E-value threshold up to which matches are included in the alignment and pattern extraction. 
At higher iterations a PSI blast profile can be corrupted and false positives are identified with significant E-values.   I.e., in a traditional blast search one can be quite certain that a match with an E-value of 10^-13 represents a homologue; this is not clear with PSI blast.  Test studies indicate that profile corruptions are likely after more than 5 iterations. On the positive side: there are many fewer false negatives with PSI blast than with normal blast.

  • False negatives: Homologous sequences that are not detected -- less in PSI blast than in normal blast
  • False positives: Non-homologous sequences that are listed as matches -- very few in normal blast - possible in PSI blast after profile corruption

The "problem" is that the E-value reported in a PSI-blast search represents the match with the profile, not with the original sequence!!

PSI BLAST Example

Query sequence:

>gi|18892111|gb|AAL80306.1|[18892111] Pfu VMA intein (including -1 and +1 extein residues)
KCVDGDTLILTKEFGLIKIKDLYEKLDGKGRKTVEGNEEWTELEEPITVYGYKNGKIVEIKATHVYKG
ASSGMIEIKTRTGRKIKVTPIHKLFTGRVTKDGLVLEEVMAMHIKPGDRIAVVKKIDGGEYVKLDTSS
VTKIKVPEVLNEELAEFLGYVIGDGTLKPRTVAIYNNDESLLKRANFlAMKLFGVSGKIVQERTVKAL
LIHSKYLVDFLKKLGIPGNKKARTWKVPKElLLSPPSVVKAFINAYIACDGYYNKEKGEIEIVTASEE
GAYGLTYLLAKLGIYATIRRKTINGREYYRVVISGKANLEKLGVKREARGYTSIDVVPVDVESIYEAL
GRPYSELKKEGIEIHNYLSGENMSYETFRKFAKVVGLEEIAENHLQHILFDEVVEVNYISEPQEVYDI
TTETHNFVGGNMPTLLHNT

Blast page

  • Use SWISSPROT
  • Check format for PSI-blast
  • Check inclusion limit
  • Turn on Filter for low complexity
  • Increase number of reported matches
  • Point out manual selection of sequences to be included in profile
  • Use PSSM to search something else (e.g., individual genomes (here))