Assignments for Friday

  • Read through the first three and the last items on the Dotlet help pages.
    Dotlet is a program that "aligns" DNA with DNA, or amino acid with amino acid, or 3 frame translations of DNA with aa sequences.
    It calculates and analyzes the alignment scores (without introducing gaps) for all possible windows in the two sequences.
  • read through Dot plot excerpt on HuskyCT

Assignments for Monday

  • Optional: read through "The origin of new genes: glimpses from the young and old" available here and on husky CT.

Gene plot for two strains of Aeromonas hydrophila here

Gene plot for Aeromonas hydrophila vs A. veronii here



Items for Discussion

Given that two homologous sequences start off with 100% similarity and then diverge over time, what percent similarity will they share when saturation with substitutions has been reached, (assume equal frequency for the different letters)
a. For nucleotide sequences?
There are four different nucleotides. If the sequence is saturated with substitutions, than the initial nucleotide has an equal probability (if the nucleotides occur with the same frequency) to be one of the 4 possibilities, one of them is a match, resulting in 25% identity.

b. For protein sequences?
Same but there are 20 letters, resulting in a 5% match probability

How would the result for a nucleotide sequence change, if the frequencies for the two nucleotides are not equal. Use composition with 40%G 40%C and 10%A, 10%T as an example.
The chance of a match for a T is equal to the probability to have T at the start, and to have a T at the end, which after saturation with substitutions is both equal to the frequency of T, i.e. chance to have T at the beginning and at the end is 0.1 time 0.1. Similar for the other nucleotides. The total probability to have a match thus is 0.1^2 + 0.1^2 + 0.4^2 + 0.4^2 = 2*.1^2 +2*.4^2=.02 + .32 = .34 = 34% , i.e. more similar than in case of equal nucleotide frequency
General formula: %identity for random sequences with biased composition: (frequency of A)^2 + (frequency of T)^2 + (frequency of G)^2 + (frequency of C)^2 . For a genome the total number of G is equal to the total number of C, thus one also could write: expected %identity = 2*((%GC/200)^2) + 2*(((100-%GC)/200)^2)

When did the Bacteria diverge from the Archaea and Eukaryotes, i.e. how old is LUCA (approximately)? Current estimates vary between 4.2 and about 3 billion years BP (compare here and here)

What is the late heavy bombardment? See
Did this sterilize Earth?
Did it happen, or was this just the tail of the early heavy bombardment?

The finding that the ribosomal RNA alone cannot perform translation is an argument against the RNA world hypothesis
It turns out that the ribosome at its catalytic core is made from RNA. Ribosomal proteins are not at the center of the peptide bond formation, which suggests that the ribosomal peptide synthesis started out as a ribosome based machinery.