13.2 Applications and skills
13.2.5 Using databases (HL)
On this page, you will practise skills from a number of different databases.
Exploring chromosome 21 in Ensembl
- Chromosome 21 is the shortest human chromosome – about 48 million base pairs long.
- Trisomy of chromosome 21 results in Down syndrome.
- From the ideogram above, it is clear that the long arm (q-arm) of the chromosome has more coding regions than the short arm (p-arm).
- It follows that there are also more GC repeats (characteristic of telomeres) on the long arm.
- If you follow the link to the original ideogram, you can click on any region to zoom in and explore genes and sequences.
Using alignment software to compare two proteins
You can use any database to get sequence data on the protein of your choice. For this activity, you will collect sequences from the NCBI database, and then use Clustal Omega software to align the two sequences.
Part A: Find sequences
Go to: www.ncbi.nlm.nih.gov.
1. From the dropdown menu on the left, choose 'gene', and type in the Search box the name of the protein you are interested in. A list of genes and species will appear. Choose one and click to the ‘full report’ page.
2. Scroll down until you see ‘genomic regions, transcripts, and products’. Click on FASTA. Copy the entire sequence, including the heading, onto your notepad (plain text format).
3. Repeat steps 1 and 2 for another protein. You will now have two sequences on your notepad.
Part B: Align sequences
Go to: www.ebi.ac.uk.
1. Paste or upload the sequences to be compared into the text box. Click on 'Submit'.
The image above shows the results of a nucleotide sequence alignment for the gene coding for the hemoglobin alpha subunit in Homo sapiens (human) and Mus musculus (mouse). Asterisks indicate similarities in the sequence.
Constructing phylograms and cladograms using DNA sequences
- Clustal Omega is more appropriately used to align multiple sequences at the same time.
- The results of a multiple sequence alignment can be displayed as a cladogram (equal branch lengths), or a phylogenetic tree, as shown below.
Food for thought
Can you determine the genetic similarity of human and mouse hemoglobin from Figure 13.2.5f?
Figure 13.2.5i – Multiple sequence alignment
Multiple sequence alignments are easier to interpret when nucleotides are colour-coded.
How reliable are knowledge claims justified by reference to data that the researcher did not acquire? How can one be sure of sources and methods developed for different purposes by different researchers? What are the implications for bioinformatics and the future of data mining?