Applied Bioinformatics [Databases]

Exercises

EMBL format

General information

The European Molecular Biology Laboratory (EMBL) maintains DNA and protein sequence databases. The format of their database entries are shown in listing below. Similar to GenBank entries, a large amount of information is given for each sequence. The EMBL format use a two letter format to type the single data fields in one EMBL entry. Every EMBL entry finishes with the sequence block, starting with the Sequence shortcut SQ and finishing with the // symbol which marks the end of the entry.

SwissProt sequence format

The SwissProt Sequence Format is very similar to the EMBL sequence format. But the SwissProt entries contain more information about physical and biochemical properties of the protein.

Example

ID   MMFOSB     standard; RNA; MUS; 4145 BP.
XX
AC   X14897;
XX
SV   X14897.1
XX
DT   23-NOV-1989 (Rel. 21, Created)
DT   12-SEP-1993 (Rel. 36, Last updated, Version 2)
XX
DE   Mouse fosB mRNA
XX
KW   fos cellular oncogene; fosB oncogene; oncogene.
XX
OS   Mus musculus (house mouse)
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;
OC   Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus.
XX
RN   [1]
RP   1-4145
RX   MEDLINE; 89251612.
RA   Zerial M., Toschi L., Ryseck R.P., Schuermann M., Mueller R., Bravo R.;
RT   "The product of a novel growth factor activated gene, fos B, interacts with
RT   JUN proteins enhancing their DNA binding activity";
RL   EMBO J. 8:805-813(1989).
XX
DR   MGD; MGI:95575; Fosb.
DR   SWISS-PROT; P13346; FOSB_MOUSE.
DR   TRANSFAC; T00291; T00291.
XX
CC   clone=AC113-1; cell line=NIH3T3;
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..4145
FT                   /db_xref="taxon:10090"
FT                   /organism="Mus musculus"
FT   CDS             1202..2218
FT                   /db_xref="SWISS-PROT:P13346"
FT                   /note="fosB protein (AA 1-338)"
FT                   /protein_id="CAA33026.1"
FT                   /translation="MFQAFPGDYDSGSRCSSSPSAESQYLSSVDSFGSPPTAAASQECA
FT                   GLGEMPGSFVPTVTAITTSQDLQWLVQPTLISSMAQSQGQPLASQPPAVDPYDMPGTSY
FT                   STPGLSAYSTGGASGSGGPSTSTTTSGPVSARPARARPRRPREETLTPEEEEKRRVRRE
FT                   RNKLAAAKCRNRRRELTDRLQAETDQLEEEKAELESEIAELQKEKERLEFVLVAHKPGC
FT                   KIPYEEGPGPGPLAEVRDLPGSTSAKEDGFGWLLPPPPPPPLPFQSSRDAPPNLTASLF
FT                   THSEVQVLGDPFPVVSPSYTSSFVLTCPEVSAFAGAQRTSGSEQPSDPLNSPSLLAL"
XX
SQ   Sequence 4145 BP; 960 A; 1186 C; 1007 G; 991 T; 1 other;
     ataaattctt attttgacac tcaccaaaat agtcacctgg aaaacccgct ttttgtgaca        60
     aagtacagaa ggcttggtca catttaaatc actgagaact agagagaaat actatcgcaa       120
     actgtaatag acattacatc cataaaagtt tccccagtcc ttattgtaat attgcacagt       180
     gcaattgcta catggcaaac tagtgtagca tagaagtcaa agcaaaaaca aaccaaagaa       240
     aggagccaca agagtaaaac tgttcaacag ttaatagttc aaactaagcc attgaatcta       300
     tcattgggat cgttaaaatg aatcttccta caccttgcag tgtatgattt aacttttaca       360
     gaacacaagc caagtttaaa atcagcagta gagatattaa aatgaaaagg tttgctaata       420
     gagtaacatt aaataccctg aaggaaaaaa aacctaaata tcaaaataac tgattaaaat       480
     tcacttgcaa attagcacac gaatatgcaa cttggaaatc atgcagtgtt ttatttaaga       540
     aaacataaaa caaaactatt aaaatagttt tagagggggt aaaatccagg tcctctgcca       600
     ggatgctaaa attagacttc aggggaattt tgaagtcttc aattttgaaa cctattaaaa       660
     agcccatgat tacagttaat taagagcagt gcacgcaaca gtgacacgcc tttagagagc       720
     attactgtgt atgaacatgt tggctgctac cagccacagt caatttaaca aggctgctca       780
     gtcatgaact taatacagag agagcacgcc taggcagcaa gcacagcttg ctgggccact       840
     ttcctccctg tcgtgacaca atcaatccgt gtacttggtg tatctgaagc gcacgctgca       900
     ccgcggcact gcccggcggg tttctgggcg gggagcgatc cccgcgtcgc cccccgtgaa       960
     accgacagag cctggacttt caggaggtac agcggcggtc tgaaggggat ctgggatctt      1020
     gcagagggaa cttgcatcga aacttgggca gttctccgaa ccggagacta agcttccccg      1080
     agcagcgcac tttggagacg tgtccggtct actccggact cgcatctcat tccactcggc      1140
     catagccttg gcttcccggc gacctcagcg tggtcacagg ggcccccctg tgcccaggga      1200
     aatgtttcaa gcttttcccg gagactacga ctccggctcc cggtgtagct catcaccctc      1260
     cgccgagtct cagtacctgt cttcggtgga ctccttcggc agtccaccca ccgccgccgc      1320
     ctcccaggag tgcgccggtc tcggggaaat gcccggctcc ttcgtgccaa cggtcaccgc      1380
     aatcacaacc agccaggatc ttcagtggct cgtgcaaccc accctcatct cttccatggc      1440
     ccagtcccag gggcagccac tggcctccca gcctccagct gttgaccctt atgacatgcc      1500
     aggaaccagc tactcaaccc caggcctgag tgcctacagc actggcgggg caagcggaag      1560
     tggtgggcct tcaaccagca caaccaccag tggacctgtg tctgcccgtc cagccagagc      1620
     caggcctaga agaccccgag aagagacact taccccagaa gaagaagaaa agcgaagggt      1680
     tcgcagagag cggaacaagc tggctgcagc taagtgcagg aaccgtcgga gggagctgac      1740
     agatcgactt caggcggaaa ctgatcagct tgaagaggaa aaggcagagc tggagtcgga      1800
     gatcgccgag ctgcaaaaag agaaggaacg cctggagttt gtcctggtgg cccacaaacc      1860
     gggctgcaag atcccctacg aagaggggcc ggggccaggc ccgctggccg aggtgagaga      1920
     tttgccaggg tcaacatccg ctaaggaaga cggcttcggc tggctgctgc cgccccctcc      1980
     accacccccc ctgcccttcc agagcagccg agacgcaccc cccaacctga cggcttctct      2040
     ctttacacac agtgaagttc aagtcctcgg cgaccccttc cccgttgtta gcccttcgta      2100
     cacttcctcg tttgtcctca cctgcccgga ggtctccgcg ttcgccggcg cccaacgcac      2160
     cagcggcagc gagcagccgt ccgacccgct gaactcgccc tcccttcttg ctctgtaaac      2220
     tctttagaca aacaaaacaa acaaacccgc aaggaacaag gaggaggaag atgaggagga      2280
     gaggggagga agcagtccgg gggtgtgtgt gtggaccctt tgactcttct gtctgaccac      2340
     ctgccgcctc tgccatcgga catgacggaa ggacctcctt tgtgttttgt gctccgtctc      2400
     tggttttctg tgccccggcg agaccggaga gctggtgact ttggggacag ggggtggggc      2460
     ggggatggac acccctcctg catatctttg tcctgttact tcaacccaac ttctggggat      2520
     agatggctgg ctgggtgggt agggtggggt gcaacgccca cctttggcgt cttgcgtgag      2580
     gctggagggg aaagggtgct gagtgtgggg tgcagggtgg gttgaggtcg agctggcatg      2640
     cacctccaga gagacccaac gaggaaatga cagcaccgtc ctgtccttct tttcccccac      2700
     ccacccatcc accctcaagg gtgcagggtg accaagatag ctctgttttg ctccctcggg      2760
     ccttagctga ttaacttaac atttccaaga ggttacaacc tcctcctgga cgaattgagc      2820
     ccccgactga gggaagtcga tgcccccttt gggagtctgc taaccccact tcccgctgat      2880
     tccaaaatgt gaacccctat ctgactgctc agtctttccc tcctgggaaa actggctcag      2940
     gttggatttt tttcctcgtc tgctacagag ccccctccca actcaggccc gctcccaccc      3000
     ctgtgcagta ttatgctatg tccctctcac cctcaccccc accccaggcg cccttggccg      3060
     tcctcgttgg gccttactgg ttttgggcag cagggggcgc tgcgacgccc atcttgctgg      3120
     agcgctttat actgtgaatg agtggtcgga ttgctgggtg cgccggatgg gattgacccc      3180
     cagccctcca aaactttccc tgggcctccc cttcttccac ttgcttcctc cctccccttg      3240
     acagggagtt agactcgaaa ggatgaccac gacgcatccc ggtggccttc ttgctcaggc      3300
     cccagacttt ttctctttaa gtccttcgcc ttccccagcc taggacgcca acttctcccc      3360
     accctgggag ccccgcatcc tctcacagag gtcgaggcaa ttttcagaga agttttcagg      3420
     gctgaggctt tggctcccct atcctcgata tttgaatccc caaatatttt tggactagca      3480
     tacttaagag ggggctgagt tcccactatc ccactccatc caattccttc agtcccaaag      3540
     acgagttctg tcccttccct ccagctttca cctcgtgaga atcccacgag tcagatttct      3600
     attttttaat attggggaga tgggccctac cgcccgtccc ccgtgctgca tggaacattc      3660
     cataccctgt cctgggccct aggttccaaa cctaatccca aaccccaccc ccagctattt      3720
     atccctttcc tggttcccaa aaagcactta tatctattat gtataaataa atatattata      3780
     tatgagtgtg cgtgtgtgtg cgtgtgcgtg cgtgcgtgcg tgcgtgcgag cttccttgtt      3840
     ttcaagtgtg ctgtggagtt caaaatcgct tctggggatt tgagtcagac tttctggctg      3900
     tccctttttg tcaccttttt gttgttgtct cggctcctct ggctgttgga gacagtcccg      3960
     gcctctccct ttatcctttc tcaagtctgt ctcgctcaga ccacttccaa catgtctcca      4020
     ctctcaatga ctctgatctc cggtntgtct gttaattctg gatttgtcgg ggacatgcaa      4080
     ttttacttct gtaagtaagt gtgactgggt ggtagatttt ttacaatcta tatcgttgag      4140
     aattc                                                                  4145
//


Please direct questions and comments to Martin Haubrock.