13.1 Essential ideas
13.1.5 Bioinformatics (HL)
Bioinformatics uses computers to analyse sequence data.
- Scientists have easy access to information stored in databases.
- The amount of data is increasing exponentially due to advances in sequencing technology, modelling and imaging software, and computing power.
Data type and tools
European Bioinformatics Institute (EBI)
DNA Data Bank of Japan (DDJB)
Protein Data Bank (PDB)
National Centre for Biotechnology Information (NCBI)
Sequence alignment software
- Sequence alignment software allows comparison of sequences from different species.
- Similarities in structure between nucleotide and protein sequences indicate similarities in function, and also evolutionary relationships. The greater the alignment, the greater the shared evolutionary history.
- The Basic Local Alignment Search Tool (BLAST) is a computer algorithm that searches sequences for regions of similarity.
- The two (or more) sequences compared in a BLAST alignment may come from databases, or a newly acquired sequence may be compared to known sequences from a database.
- BLASTn allows nucleotide sequence alignment, while BLASTp allows protein alignment.
- Phylogeny refers to a species’ evolutionary line of descent. Multiple sequence alignments are used in the study of phylogenetics and to construct phylogenetic trees.
- A phylogenetic tree differs from a cladogram (see 5.1.4) because the length of each branch represents the amount of change over time in that species.
Figure 13.1.5a – Comparison of cladogram and phylogenetic trees
- Similarities in biochemical sequences that are a product of evolution are called homologies. Homologous sequences are shared in species that have a common ancestry.
- Some sequences may be similar due to random mutation. These are not a product of evolution and are called analogous sequences.
- Computer-based algorithms take into account the rate of mutation when multiple sequences are aligned, so that evolutionary relationships can be determined more accurately.
- There are ethical considerations that prevent in vivo experimentation on humans and other higher order animals.
- Gene function can be studied using model organisms with similar sequences, or having similar biochemical mechanisms and pathways.
- Some of the most extensively studied model organisms are Drosophila melanogaster (fruitfly), Escherichia coli (bacteria), and Mus musculus (white lab mouse), the last having more than 80% genetic similarity with humans.
Application: Knockout technology in mice
- Researchers can determine the function of a gene in vivo, by observing the effects of its removal. The functional sequence of a gene is 'knocked out' and replaced with a non-functioning sequence.
Figure 13.1.5c – General strategy for gene targeting
- Embryonic stem (ES) cells are transfected with the manipulated gene and cultured in vitro. The modified stem cells are then injected into an early embryo, resulting in mosaic mice that carry cells from two mouse strains (wild type and knockout)
- Subsequent generations are bred to produce a population of knockout mice. The effect of the altered gene can be compared experimentally with the wild type mouse.
Identifying potential genes using ESTs
- Complete genes contain regions of coding and non-coding sequences. While genes are being expressed, mRNA containing only coding sequences is present in cells.
- This mRNA can be captured from tissues and used to produce sequences of cDNA using reverse transcriptase. The resulting cDNA sequence contains between 200 and 500 nucleotides and is called an expressed sequence tag (EST).
- Scientists can compare newly generated ESTs to databases containing ESTs of known gene function. Similarly, they can use a BLASTn search to align unknown ESTs to sequences in another species’ genome.
Application: Discovery of genes by EST data mining
Data mining is the practice of using existing data to generate new information. Between 1990 and 1995, hundreds of thousands of ESTs were generated using automated sequencing technologies. Today, researchers can use the existing EST databases to match a novel EST sequence with an identified function in another species.
Figure 13.1.5d – Drosophila
Gene function can be studied using model organisms with similar sequences
Nature of Science
Cooperation and collaboration between scientists: databases on the internet allow scientists free access to information
Food for thought
- What is meant by the term ‘molecular clock’? How do scientists determine when species diverged from each other on evolutionary timelines?