Assignments for Friday

  • Familiarize youself with a few basic unix commands (cd, ls, ps, more, cat, ...)

Assignments for Wednesday

Assignments for Monday

  • Read through intron sections on Wikipedia:
    • http://en.wikipedia.org/wiki/Intron
    • http://en.wikipedia.org/wiki/RNA_splicing
    • http://en.wikipedia.org/wiki/Group_I_catalytic_intron
    • en.wikipedia.org/wiki/Group_II_intron
Bring at least one question for the review session (Midterm is next week Wednesday)

line

 

Discuss perl scripts and gene plots powerpoint slides here

 

Discuss gene and genome rearrangements.

sequence space

 

line

Items for Discussion

Given that two homologous sequences start off with 100% similarity and then diverge over time, what percent similarity will they share when saturation with substitutions has been reached, (assume equal frequency for the different letters)
a. For nucleotide sequences?
There are four different nucleotides. If the sequence is saturated with substitutions, than the initial nucleotide has an equal probability (if the nucleotides occur with the same frequency) to be one of the 4 possibilities, one of them is a match, resulting in 25% identity.

b. For protein sequences?
Same but there are 20 letters, resulting in a 5% match probability

How would the result for a nucleotide sequence change, if the frequencies for the two nucleotides are not equal. Use composition with 40%G 40%C and 10%A, 10%T as an example.
The chance of a match for a T is equal to the probability to have T at the start, and to have a T at the end, which after saturation with substitutions is both equal to the frequency of T, i.e. chance to have T at the beginning and at the end is 0.1 time 0.1. Similar for the other nucleotides. The total probability to have a match thus is 0.1^2 + 0.1^2 + 0.4^2 + 0.4^2 = 2*.1^2 +2*.4^2=.02 + .32 = .34 = 34% , i.e. more similar than in case of equal nucleotide frequency
General formular: %identity for random sequences with biased composition: (frequency of A)^2 + (frequency of T)^2 + (frequency of G)^2 + (frequency of C)^2 . For a genome the total number of G is equal to the total number of C, thus one also could write: expected %identity = 2*((%GC/200)^2) + 2*(((100-%GC)/200)^2)

When did the Bacteria diverge from the Archaea and Eukaryotes, i.e. how old is LUCA (approximately)? Current estimates vary between 4.2 and about 3 billion years BP (compare here and here)

What is the late heavy bombardment? See http://en.wikipedia.org/wiki/Late_Heavy_Bombardment.
Did this sterilize Earth?
Did it happen, or was this just the tail of the early heavy bombardment?

The finding that the ribosomal RNA alone cannot perform translation is an argument against the RNA world hypothesis
FALSE
It turns out that the ribosome at its catalytic core is made from RNA. Ribosomal proteins are not at the center of the peptide bond formation, which suggests that the ribosomal peptide synthesis started out as a ribosome based machinery.