Assignments for Friday
Assignments for Monday
NCBI (National Center for Biotechnology Information) is a home for many public biological databases (see diagram below). All of the databases are interlinked, and they all have common search and retrieval system - Entrez.
A list of the different databases in ENTRZ is here.
A Pubmed tutorial click here (goes well beyond what you need to know for Friday).
Search Field Tags- Listed here.
Explore features of NCBI Search interface: Advanced Search, Index, Clipboard and MyNCBI.
Other Useful Databases and Services:
While Medline is incorporating more and more non-medical literature, there might still be gaps in the coverage. Alternatives are other databanks available though the National Library of Medicine (here) and the local services offered at the UConn libraries. Especially Current Contents and Agricola nicely complement PubMed. The best way to access them is through the UConn library's website. In particular, the "Web of Science" database gives access to the Science Citation Index: a database that tracks cited references in journals. Scopus provides similar services. (But Google Scholar has gotten nearly as useful -- eg here.)
Note that many resources are restricted to the UConn domain, thus you either need to access them from a campus computer or through the proxy account. The university now provides easy vpn access through the Juno Pulse application (see http://remoteaccess.uconn.edu/vpn-overview/connect-via-vpn-client/).
In searching PubMed, you can add links to online journals for which UConn has a subscription. (If you are outside UConn, you need to use vpn for the links to work). The link to use for pubmed is http://www.ncbi.nlm.nih.gov/sites/entrez?otool=uconnlib
If you want to be informed about new sequences/articles in your research area? Check out these services (- you also can use MyNCBI for this, but I use Pubcrawler for several years and it works reliably):
Use MyNCBI at Entrez or PubCrawler for repeating searches in regular intervals.
Do example on clipboard and index. (use GI 2266989 (nucl) and 3334404 (prot))
Other web pages:
Nucleic Acid Research Database Issue
|Sequence and structure databanks can be divided into many different categories.
One of the most important is:
One problem in maintaining databanks (supervised and unsupervised) is "owner ship" of sequences, which in many data banks prevents a continuous update of sequences. Even if errors are detected, they are not easily removed form the databank. E.g. ATP synthase operons in E.coli see Fig.1 in http://mic.microbiologyresearch.org/content/journal/micro/10.1099/mic.0.033811-0#tab2
False positives: The number of false positives are estimated in the E-value. The P-value or significance value gives the probability that a positive identification is made in error (same as with drug tests).
False negatives: Homologous sequences in the databank that are not recognized as such. If there are only 12000 different protein families, an average a sequence should have (size of the databank)/12000 matches. In other words, the number of false negatives is probably very large.
Goals Class 7