gobics.de: Molecular Biology Methods Course

gobics.de [Molecular Biology Methods Course ]


Databases

Katharina Hoff & Fabian Schreiber


Links & Literature

Essential Data Sources:

These sites contain some of the most frequently used data for bioinformatics studies, and in addition to that, many interesting and useful tools.
  • SWISS-PROT/TrEMBL Curated protein sequence databank, extensively annotated and cross-linked
  • GenBank All known nucleotide sequences
  • EMBL All known nucleotide sequences
  • RefSeq Curated versions of all known mRNA, gene and protein sequences. The only place to find "official" versions of nucleotide sequences.
  • UniProt Universal Protein Knowledgebase, successor of SwissProt and PIR
  • Kyoto Encyclopedia of Genes and Genomes (KEGG) Metabolic and regulatory pathways
  • PDB - Protein Data Bank Macromolecule structures determined by X-ray crystallography and NMR, both proteins and DNA. The principal source for molecular 3D coordinates.

    Genomes:

    Genome databases contain partial or full sequences for the chromosomes of organisms. Certain centers (KEGG, Sanger, EMGLib, TIGR, Celera) distribute several genomes whereas others concentrate on a single organism. The first two below (GOLD, KEGG) are good places for finding any genome project.
  • GOLD - Genomes OnLine Database Comprehensive list of complete and ongoing genome projects
  • KEGG Genomes, metabolic and regulatory pathways etc.
  • NCBI Human genome and lots of resources
  • Entrez Gene Official nomenclature, aliases, sequences, phenotypes, MIM numbers, UniGene clusters, homology, map locations, and more.
  • Sanger Human, Caenorhabditis elegans, and Schizosaccharomyces pombe sequences and genomic information
  • FlyBase Drosophila genome and much more.
  • Saccharomyces Genome Database (SGD) A fine selection of tools in addition to the genome
  • MGI Mouse genetics and genomics
  • UW E. coli Genome Project E. coli K-12 genes, their functional characterization and their regulation.
  • MIPS Protein and genome information
  • TIGR Microbial Database Microbial genomes and chromosomes
  • ZFIN Zebrafish genetics and development; mutant and wild-type lines
  • GOBASE Organelle genome database
  • MITOMAP Human mitochondrial genome
  • genomesize.com Database of genome sizes, covering even species which have not been sequenced

    Genetic Maps:

    Genomic sequencing is usually based on certain markers, which can be used to locate genes. These markers are important also for the "gene hunting", localization of certain genes.
  • GDB Human genes and genomic maps
  • GENATLAS Human genes, markers and phenotypes
  • IXDB Physical maps of human chromosome X
  • RHdb Radiation hybrid map data

    Gene Identification and Structure:

    These registries supply more detailed intelligence on e.g. codon usage, exons and introns, regulatory regions such as polymerase bindings sequences.
  • Codon Usage Database Codon usage tables
  • EPD Eukaryotic POL II promoters
  • ExInt Database Exon-intron structure of eukaryotic genes
  • Transterm Codon usage, start and stop signals
  • TRRD Regulatory regions of eukaryotic genes
  • Ares Lab Intron Site Yeast spliceosomal introns
  • COMPEL (in Novosibirsk, sometimes unaccessible) Composite regulatory elements
  • YIDB Yeast nuclear and mitochondrial intron sequences
  • OoTFD Transcription factors and binding sites
  • IRESdb Internal Ribosome Entry Sites

    RNA Sequences:

    Databases specific for different types of RNA molecules such as mtRNA, tRNA, tmRNA and rRNA.
  • 5S rRNA Database 5S rRNA sequences
  • Activity Functional DNA/RNA site sequences
  • lsu rRNA database Alignment of large subunit ribosomal RNA sequences
  • ssu rRNA database Alignment of small subunit ribosomal RNA sequences
  • Non-Canonical Base Pair Database RNA structures containing rare base pairs
  • Ribosomal Database Project (RPD) rRNA sequences, alignments, and phylogenies
  • RNA Modification Database Naturally modified nucleosides in RNA
  • SRPDB Signal recognition particle RNA, protein, and receptor sequences
  • tRNA Sequences tRNA and tRNA gene sequences
  • UTRdb 5' and 3' UTRs of eukaryotic mRNAs
  • tmRNA WEBSITE The bacterial tmRNA (10Sa RNA)

    Structure:

    Information related to macromolecule (mainly protein) three dimensional structure and their analyses.
  • PDB - Protein Data Bank Macromolecule structures determined by X-ray crystallography and NMR, both proteins and DNA. The principal source for molecular 3D coordinates.
  • MSD - Macromolecular Structure Database at EBI The European project for the collection, management and distribution of data about macromolecular structures, derived in part from the Protein Data Bank (PDB).
  • MMDB Database of macromolecular 3D structures at NCBI, data taken from PDB but enhanced with consistent taxonomy, consistent secondary structure assignments etc. Searchable with Entrez, can be directly linked to sequence and/or literature searches.
  • NDB A database specializing in nucleic acid 3D structures and DNA-binding protein structures.
  • CATH Hierarchical classification of protein domain structures
  • SCOP Familial and structural protein relationships
  • ASTRAL Analysis of protein structures and their sequences
  • Gene3D A database of precalulated structural assignments for genes within whole genomes.
  • HSSP Structural families and alignments. Homology-derived structures of proteins (secondary and tertiary), similar to Gene3D.
  • Membrane protein topology database This database contains information of experimentally verified transmembrane helices (172 proteins in Jan 2007)
  • BioMagResBank A database of NMR-derived protein and nucleic acid 3D structures
  • RESID Protein structure modifications
  • Database of Macromolecular Movements Motions of protein loops, domains and subunits, including movies
  • IMB Jena Image Library Visualization and analysis of three-dimensional biopolymer structures
  • LPFC Library of protein family core structures
  • CSD Crystal structure information for organic and metal organic compounds.

    Mutation Database:

    Registers of human hereditary disease-causing genetic defects and other mutations.
  • Online Mendelian Inheritance in Man (OMIM) Catalog of human genetic and genomic disorders
  • Human Gene Mutation Database (HGMD) Known (published) gene lesions responsible for human inherited disease
  • dbSNP Single nucleotide polymorphisms
  • ALFRED Allele frequencies and DNA polymorphisms
  • Atlas of Genetics and Cytogenetics in Oncology and Hematology Chromosomal abnormalities in cancer
  • BTKbase Mutation registry for X-linked agammaglobulinemia
  • HIV-RT HIV reverse transcriptase and protease sequence variation. Shows an interesting focus in the interplay of medication, development of resistance and sequence changes.
  • KinMutBase Disease-causing protein kinase mutations
  • PAHdb Mutations at the phenylalanine hydroxylase locus. A good example of a disease-oriented database.
  • PMD Compilation of protein mutant data

    Protein motifs:

    Databases and analysis software for identification of patterns and motifs from protein sequences.
  • InterPro InterPro combines data from a number of domain and motif databases, and is the number one resource in this field.
  • Pfam Multiple sequence alignments and hidden Markov models of common protein domains
  • ProDom Protein domain families, obtained by automated clustering. This collection is the largest available, but automation may introduce some errors.
  • PRINTS Protein sequence motifs and signatures
  • BLOCKS Ungapped multiple protein alignments extracted from SwissProt/TrEMBL entries, corresponding to the most highly conserved regions in protein families documented in InterPro
  • SMART Identification and annotation of genetically mobile domains and the analysis of domain architectures
  • PROSITE Biologically-significant protein patterns and profiles
  • iProClass Comprehensive family relationships and structural/functional features of protein

    Gene Expression:

    The transcription of genes in genomes can be easily analysed e.g. with chip technology. For the distribution and analysis a number of Web sites are available.
  • BodyMap Human and mouse gene expression data
  • FlyView Drosophila development and genetics
  • Gene Expression Database (GXD) Mouse gene expression and genomics
  • Kidney Development Database Kidney development and gene expression
  • Mouse Atlas and Gene Expression Database Spatially-mapped gene expression data
  • PEDB Normal and aberrant prostate gene expression
  • Tooth Development Database Gene expression in dental tissue

    Proteomics:

    Registries and programs for the analysis of protein translation.
  • SWISS-2DPAGE 2D-PAGE images and reference maps
  • Phosphoprotein Database (PPDB) 2D gel maps of phosphoproteins under various conditions
  • AAindex Physicochemical properties of amino acids and peptides