Assignments for Friday

  • Familiarize yourself with a few basic UNIX commands (cd, ls, ps, more, cat, ...)
  • Read through take home exam #4

Assignments for Monday

  • Read through intron sections on Wikipedia:
    • read through white box on introns below
  • Bring at least one question for the review session (Midterm is next week Wednesday)
  • Complete take home exam #4


What is a phylogeny?

Discuss tree of life - three domains, main endosymbiosis events, archaeplastida, other algae.

Slides on cladistics.

Short intro to sequence space


Probably for next week Monday

Discussion of gene creation, selfish genes, ...

Slides on introns

What are introns? Remnants from an early world? Recent invaders?


Introns and Their Evolution


Three groups of introns based on their splicing mechanisms:

group I and II are self-splicing [have different splicing mechanism: see this figure for comparison of splicing]:

group III introns are present in eukaryotic nucleus, need spliceosomes to splice out:


Where different groups of introns occur?

  • Group I: were discovered in ciliated protozoan Tetrahymena; found also in Physarum, fungal mitochondria and phage T4, rare in Bacteria, one is present in Thermotoga 23SrRNA
  • Group II: common in Bacteria, and so far found only in one Archaeal genus, Methanosarcina
  • Spliceosomal Introns: present throughout eukaryotes, but more common in "crown-group" eukaryotes

Where do spliceosomal introns come from and how the splicing machinery evolved?


Spliceosomal introns evolved from Class II introns; the function of some of the internal loops of the class II introns are taken over by the spliceosomal snRNA (small nuclear RNA).


Gratuitous complexity hypothesis for evolution of spliceosomal machinery: See reading assignment on WebCT [the portions for the reading are highlighted in the PDF file]


class II introns are found in bacteria, and only in one Archaeal genus, Methanosarcina; why is it that predominately "crown-group" eukaryotes have introns?

Not much of a splice site consensus (exon1 GT-intron-AT exon2)

Group I introns often have homing endonucleases.
Homing endonucleases and intron mobility. Spread in populations, selective pressure on endonuclease. See the excellent paper by Goddard and Burt on the reinvasion cycle.

Also: reverse splicing

Possible benefits of having introns:

Exon shuffling, alternative splicing (1 gene -> different protein products) ....

Two rival hypotheses: Intron Early vs. Intron Late

Intron early:

Protein diversity arose in analogy to exon shuffling in the generation of antibody diversity (see your biochemistry or genetics textbook on the maturation of the immune system).


Intron late:

Present day introns are late invaders of already functional genes. Exon shuffling might play some role in eukaryotes, but most of protein diversity arose before introns invaded protein coding genes.

  • distribution of introns mapped on phylogenetic trees unambiguously points towards late invasion (and here).
  • The correlation between structure and intron position is not unambiguous.
  • The finding that introns in mitochondrial (eubacterial) and nucleocytoplasmic genes have introns in the same location could reflect a preferred intron integration site. The phase pattern is also observed in vertebrate genes, in which the introns are of late origin.
  • Exon shuffling requires introns located in the same phase, but there might be other reasons for having a slight excess of introns in the same phase. For introns to frequently invade genes, there needs to be mechanisms for introns to find new "homes" (see above).


mixed model of intron evolution
  • version 1 - while some introns are recent, most are old. E.g.: [Roy, 2003].
  • version 2 - while most introns are recent, some are older, but not necessarily very old. E.g.: [Rogozin et al., 2003]


it was suggested that class II introns were the reason for the separation between transcription and translation in Eukaryotes (accomplished through the nuclear envelope). Martin and Koonin's hypothesis suggests that class 2 introns were brought into the eukaryotic cell by the mitochondrial endosymbiont.




Items for Discussion

Given that two homologous sequences start off with 100% similarity and then diverge over time, what percent similarity will they share when saturation with substitutions has been reached, (assume equal frequency for the different letters)
a. For nucleotide sequences?
There are four different nucleotides. If the sequence is saturated with substitutions, than the initial nucleotide has an equal probability (if the nucleotides occur with the same frequency) to be one of the 4 possibilities, one of them is a match, resulting in 25% identity.

b. For protein sequences?
Same but there are 20 letters, resulting in a 5% match probability

How would the result for a nucleotide sequence change, if the frequencies for the two nucleotides are not equal. Use composition with 40%G 40%C and 10%A, 10%T as an example.
The chance of a match for a T is equal to the probability to have T at the start, and to have a T at the end, which after saturation with substitutions is both equal to the frequency of T, i.e. chance to have T at the beginning and at the end is 0.1 time 0.1. Similar for the other nucleotides. The total probability to have a match thus is 0.1^2 + 0.1^2 + 0.4^2 + 0.4^2 = 2*.1^2 +2*.4^2=.02 + .32 = .34 = 34% , i.e. more similar than in case of equal nucleotide frequency
General formula: %identity for random sequences with biased composition: (frequency of A)^2 + (frequency of T)^2 + (frequency of G)^2 + (frequency of C)^2 . For a genome the total number of G is equal to the total number of C, thus one also could write: expected %identity = 2*((%GC/200)^2) + 2*(((100-%GC)/200)^2)

When did the Bacteria diverge from the Archaea and Eukaryotes, i.e. how old is LUCA (approximately)? Current estimates vary between 4.2 and about 3 billion years BP (compare here and here)

What is the late heavy bombardment? See
Did this sterilize Earth?
Did it happen, or was this just the tail of the early heavy bombardment?

The finding that the ribosomal RNA alone cannot perform translation is an argument against the RNA world hypothesis
It turns out that the ribosome at its catalytic core is made from RNA. Ribosomal proteins are not at the center of the peptide bond formation, which suggests that the ribosomal peptide synthesis started out as a ribosome based machinery.