(illustrations of homologs that do not show significant sequence similarity in pairwise comparisons :
Jim Knox (MCB-UConn) has studied many proteins involved in bacterial cell wall biosynthesis and antibiotic binding, synthesis or destruction. Many of these proteins have identical 3-D structure, and therefore can be assumed to be homologous; however, the tests based on pairwise sequence comparisons fail to detect this homologies. (for example, enzymes with GRASP nucleotide binding sites are depicted here.)
DNA replication involves many different enzymes. Some of the proteins do the same thing in bacteria, archaea and eukaryotes; they have similar 3-D structures (e.g.: sliding clamp, E. coli dnaN and eukaryotic PCNA, see Edgell and Doolittle, Cell 89, 995-998), but again, the above tests fail to detect homology.
and F1-ATPase. Both form hexamers with something rotating in the middle (either
the gamma subunit or the DNA; D. Crampton, pers. communication). The monomers
have the same type of nucleotide binding fold (picture)
Helicase and F1-ATPase. Both form hexamers with something rotating in the middle (either the gamma subunit or the DNA; D. Crampton, pers. communication). The monomers have the same type of nucleotide binding fold (picture)
blast and commandline blast (the slides contain links that only become accessible, after you switched to presentation mode)
E-values and multiple tests
False positives: The number of false positives are estimated in the E-value. The P-value or significance value gives the probability that a positive identification is made in error (same as with drug tests).
False negatives: Homologous sequences in the databank that are not recognized as such. If there are only 12000 different protein families, on average a sequence should have (size of the databank)/12000 matches. In other words, the number of false negatives is probably very large.
Goals class 10