Types of Error in a Databank search False positives: The number of false positives are estimated in the E-value. The P-value or significance value gives the probability that a positive identification is made in error (same as with drug tests).
False negatives: Homologous sequences in the databank that are not recognized as such. If there are only 12000 different protein families, on average a sequence should have (size of the databank)/12000 matches. In other words, the number of false negatives is probably very large. |
Decay of significance. Can this be corrected?
Meaning of phylogeny.
Another example of databank errors: Even species names are often wrongly assigned (slides)
