HOMEWORK ASSIGNMENT #6:
- Read chapter 5
- Write a script that reads in a nucleotide sequence from a file in Genbank
format, and puts out a file in FASTA format. Implement an informative annotation
line in the FASTA formated file.
- Improve your count bases in genome program
- Add a counter of nucleotide excesses (A over T, or G over C, or keto over
amino base excess ((G+T)-(A+C))). Print the cumulative excess into a table
and plot the result with gnuplot.
- What does the result mean? Which of the above measures (any others you
could try?) shows most bias?
Does the same work for dinucleotide bias? How about larger oligonucleotides?
Try to implement the former, and, if you have energy to spare, write some "pseudocode"
for the latter (oligonucleotides).