Assignments for Today:
Assignments for Wednesday
Discussion of Take Home Exam 1 (w anwers) (Excel file for extra question) -- Pan Genome and KS Plot slides here
|Sequence and structure databanks can be divided into many different categories.
One of the most important is:
One problem in maintaining databanks (supervised and aunsupervised) is "owner ship" of sequences, which in many databanks prevents a continuous update of sequences. Even if errors are detected, they are not easily removed form the databank. E.g. ATP synthase operons in E.coli see http://mic.sgmjournals.org/cgi/content-nw/full/156/7/1909/F1
Even species names are often wrongly assigned (slides)
False positives: The number of false positives are estimated in the E-value. The P-value or significance value gives the probability that a positive identification is made in error (same as with drug tests).
False negatives: Homologous sequences in the databank that are not recognized as such. If there are only 12000 different protein families, an average a sequence should have (size of the databank)/12000 matches. In other words, the number of false negatives is probably very large.
Discussion: Decay of significance. Can this be corrected?
Addendum to virus and life discussion:
Cellular automata: A'life; John Conway's game of life. [rules: a cell survives if it has two or three living neighbors. A new cell is created on a "dead" square if it has exactly three living neighbors.] The game was popularized by Martin Gardner in Scientific American in 1970.
More information on digital life is at Digital evolution homepage at MSU.
Karl Sims' virtual creatures are worth a look, movie here.
A similar approach to evolution in silico is here.
A'life not really alive
Go through coral of life ppt slides
What does the term phylogeny mean? Is it compatible with reticulation events?