J.Peter Gogarten
Dept. Molecular and Cell Biology
University of Connecticut
Storrs, CT 06269-3125

Phone: 486-4061

Office hours

 MW noon-1pm, or after arrangement. For fast response send an email!
For questions of general interest, use the bulletin board on WebCT.

Basis for grading:

Participation (webct discussion board, in class discussions),
Assignments from the computer labs,
Take-home quizzes and Questions,


Final: 30%,
Midterm: 25%,
Take Home Exams, essay, presentations: 35%
(You can drop the worst 4 grades (3 for grad students) from the Take Home exams ).
Participation, bulletin board postings, and in-class assignments: 10%

Expectation: More than 3h reading/studying per week (most will do fine with about 3-6h/week).

For Honors student conversion - Students please send an email to

C-credit can be arranged


We will not use a printed book for reading assignments this year!

Reading assignements will use texts available on the www, including wikipedia and materials available through WebCT.


Recommended books:

book cover

Introduction to Bioinformatics (Paperback)
by Arthur M Lesk, 3rd edition

at Amazon

Essential Bioinformatics (Paperback)
by Jin Xiong

Excellent book, it provides a very readable and concise overview of the most important tools and concepts in Bioinformatics

Link to


Bioinformatics for Dummies
by Jean-Michel Claverie

Excellent introductory bioinformatics book.

Link to

Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Third Edition

Edited by Andreas D. Baxevanis and B. F. Francis Ouellette

The book covers many aspect of bioinformatics that we do not cover in class, but it is an excellent reference. The section on phylogenetics is rather weak, but you have your instructors to provide you with much more detail.

Don't buy the 2nd edition by mistake!

link to link to publisher

Excellent book to look up things and to consult if faced with a real world problem.
Covers many more techniques and approaches than we will in this course.


Bioinformatics And Molecular Evolution (Paperback)
by Paul G. Higgs, Teresa K. Attwood

The authors discuss in detail many applications in molecular evolution and bioinformatics. This book should be very useful to those who want to study some aspects of things covered in this course in more detail.

Link to


Inferring Phylogenies
by Joe Felsenstein

ISBN: 0-87893-177-5;   $61.95 paper

Excellent book on phylogenetics and many aspects of population genetics (e.g., gene coalescence in populations, a topic that is rather relevant to species phylogenies in microorganisms :)).
For most MCBler this is not exactly bedtime reading, but if you need a well founded thorough explanation, this is a good book to consult.

Link to publisher, link to

Molecular Evolution : A Phylogenetic Approach
by Roderic D. M. Page, Edward C. Holmes Price: $63.95 Paperback - 352 pages (October 1998)

Blackwell Science Inc; ISBN: 0865428891

This book gives an excellent introduction to terms, methods, and problems in molecular evolution.  It does not contain too many details on individual algorithm, but it provides a very readable overview. 
Rather expensive!


Other recommended books:

From Gaia to Selfish Genes, Selected Writings in the Life Sciences
Edited by Connie Barlow

MIT Press, isbn 0 262-52178-4


Graur and Li: Fundamentals of Molecular Evolution, Second Edition

Topics (incomplete: open to student input)


Bioinformatics (general definition): 
   Area between Computer Sciences (Informatics) and Biology (genomics)
(or application of the tools of informatics to biology)

Bioinformatics took off only with the availability of large amounts of genome information, thus a more narrow delineation might be:
     Area between Informatics and Genomics

Related areas: Computational biology, Cybernetics

 Typically bioinformatics is considered to include

management of biological databanks,
access to biological data, and
extracting useful information from biological data.
For more detailed discussion see Mark Gerstein's introduction


Bioinformatics at UConn:

Courses relevant for students of bioinformatics are offered through a variety of different departments, colleges and schools at UConn. There is at present no Major in BIOINFORMATICS; however, UConn offers a Minor in Bioinformatics that is suitable for students in MCB, EEB, PNB and CSE. For information click here. An updated audit sheet is here. You should be aware that Bioinformatics is a field in its infancy. Many schools have rushed to attach the name Bioinformatics to a program, but upon closer inspection one realizes that this is not what one would hope for in a Bioinformatics program. E.g., often a single databank course attached to a normal biochemistry curriculum. Everything considered, the offering at UConn could be more streamlined for CSE and Biology students, respectively, but regarding the content UConn's offering isn't bad either.



Assignment for Friday:

Assignment for Wednesday:

Don't be too sure that what you read in textbooks is actually useful.
For example, an often stated criterion is "being made from cells". While we can make this criterion true for most life on Earth (there are some problems with organisms that are syncytia - but one can redefine what one means by cell :-)); life on a surface might be a prebiotic alternative. Or what about self-replicating nanorobots directed by an intelligent computer?

Background Information:

Traditional criteria for Life: Supplemental Information



What does Bioinformatics have to do with Molecular Evolution? 

Problem: Application of first principles does notwork (yet)

The following chain although (believed to be) mainly determined by the DNA sequence (plus other components of the cell which in turn are encoded by other parts of the genome) can at present not be simulated in a computer.  

DNA sequence ->
transcription ->
translation ->
protein folding ->
protein function (catalytic and other properties) ->
properties of the organism(s) ->
ecology (taking also the non biological environment into account) ->

... .


Most scientists believe that the principle of reductionism (plus new laws and relations emerging on each level) is true for this chain; however, this is clearly "in principle" only.
Biology relies on this sequence to work more or less unambiguously (prions), but:

At several steps along the way from DNA to function our understanding of the chemical and physical processes involved is so incomplete that prediction of protein function based on only a single DNA sequence is at present impossible (at least for a protein of reasonable size).

Use evolutionary context:

"Nothing in biology makes sense except in the light of evolution"

Theodosius Dobzhansky

Present day proteins evolved through substitution and selection from ancestral proteins.
Related proteins have similar sequence AND similar structure AND similar function.

In the above mantra "similar function" can refer to:

  • identical function,

  • similar function, e.g.:
    • identical reactions catalyzed in different organisms; or
    • same catalytic mechanism but different substrate (malic and lactic acid dehydrogenases);
    • similar subunits and domains that are brought together through a (hypothetical) process called domain shuffling, e.g. nucleotide binding domains in hexokinase, myosin, HSP70, and ATPsynthases.

The Size of Protein Sequence Space (back of the envelope calculation):

Consider a protein of 600 amino acids.
Assume that for every position there could be any of the twenty possible amino acid.
Then the total number of possibilities is
20 choices for the first position times 20 for the second position times 20 to the third .... = 20 to the 600 = 4*10^780 different proteins possible with lengths of 600 amino acids.

For comparison the universe contains only about 10^89 protons and has an age of about 5*10^17 seconds or 5*10^29 picoseconds.

If every proton in the universe were a computer that explored one possible protein sequence per picosecond, we only would have explored 5*10^118 sequences, i.e. a negligible fraction of the possible sequences with length 600 (one in about 10^662).

The following is based on observation and not on an a priori truth:

If two proteins (not necessarily true for nucleotide sequences) show significant similarity in their primary sequence, they have shared ancestry, and probably similar function.
(although some proteins acquired radically new functional assignments, lysozyme -> lense crystalline). 

To date there is no example known where convergent evolution has let to significant similarity of the primary sequence (although here are examples where similar selection pressures have resulted in similar convergent substitutions in homologous proteins).


for one of two reasons:

a)  they evolved independently
(e.g. different types of nucleotide binding sites);


b)   they underwent so many substitution events that there is no readily detectable similarity remaining.

(reason: see B above); many recent advances concern the improved detection of similarity.

Slides class01.ppt