Applied Bioinformatics

Solutions

Exercises II

Select the nucleotide database at the NCBI homepage (pulldown menu »All Databases«, upper left), retrieve the GenBank entry with the accession number X76930.
- Read the annotation to get just the coding sequence. What happens when use use the CDS html reference?
- Cut out the coding sequence and convert it to their corresponding aminoacid sequence. Use a text editor to build up the coding sequence for that gene. To translate the sequence you can try one of the following internet resources:
  - BCM Search Launcher
  - Expasy
- Compare the result of your amino acid sequences with the annotated sequence in the GenBank entry. Describe in short words the concept of reading frames.
Use the ENSEMBL webinterface to retrieve organisms which have a gene entry for the »glucokinase gene«. How many species which are stored in the system contain an entry for this gene? Select the human Ensembl gene entry for glucokinase.
- Which chromosomal location is linked to the human glucokinase?
- How many exons can be observed for the proteins?
- How many Exons are given known?
- What is the biochemical function of the coded protein?
- What is the meaning of red filled rectangles and non-filled rectangles?
- Use the genomic location link to get an overview for the retrieved gene and the corresponding genomic organisation. Please try to reidentify the coding strand of that gene.
- Is this gene located on the plus or minus strand?
- How many ESTs (expressed sequence tags) can be observed for that gene? Use the feature settings in the genomic view to get all known EST-information stored in Ensembl.
- How many orthologous gene(s) are given for mouse and rat?
Sequence download using the export function of the Ensembl webinterface:
- complete genomic location for the glucokinase gene.
- coding region
- peptide sequence
- first exon of the gene
Browse to the Entrez site, describe the content of the following databases: Gene, UniGene, and GeoProfiles. Use the icon to get more information.
Enter the search term Huntington disease in the Entrez webinterface and submit your query. Try to identify the gene which is responsible for that disease. Download the involved gene in Fasta and GenBank format. Paste your downloads in a text file.
Browse to SRS@EBI. Choose the panel library page and get and overview which databases are available for searching.
- Select the EMBL database in the Libary Page and go to the query form panel. How many sequences are available in the current EMBL version for the chimp organism?
- Use the same search interface to retrieve all entries in EMBL which are connected to the chimp organism and linked to the LDHA gene (search terms: LDHA, chimp). Please use the pulldown menus to restrict your search.
Use the SRS system to retrieve the protein sequence for the L-lactate dehydrogenase A chain (LDHA) for human and chimp.
- Which OMIM entry (see MIM entry) or which disease is connected to the human LDHA entry in Swiss-Prot?
- Store the human and chimp entries in the local file system.
- Cut out the first 30 amino acids for each sequence. How similar are the two subsequences?
- How many amino acids are different?

Please direct questions and comments to Martin Haubrock.

Applied Bioinformatics [Databases]

Exercises II