13.2 Applications and skills

13.2.5 Using databases (HL)

On this page, you will practise skills from a number of different databases.

Exploring chromosome 21 in Ensembl

Figure 13.2.5a - Chromosome 21, retrieved from: useast.ensembl.orgFigure 13.2.5a - Chromosome 21, retrieved from: useast.ensembl.org

  • Chromosome 21 is the shortest human chromosome – about 48 million base pairs long.
  • Trisomy of chromosome 21 results in Down syndrome.
  • From the ideogram above, it is clear that the long arm (q-arm) of the chromosome has more coding regions than the short arm (p-arm).
  • It follows that there are also more GC repeats (characteristic of telomeres) on the long arm.
  • If you follow the link to the original ideogram, you can click on any region to zoom in and explore genes and sequences.

Using alignment software to compare two proteins

You can use any database to get sequence data on the protein of your choice. For this activity, you will collect sequences from the NCBI database, and then use Clustal Omega software to align the two sequences.

Part A: Find sequences

Go to: www.ncbi.nlm.nih.gov.

Figure 13.2.5b – Step 1Figure 13.2.5b – Step 1

1. From the dropdown menu on the left, choose 'gene', and type in the Search box the name of the protein you are interested in. A list of genes and species will appear. Choose one and click to the ‘full report’ page.

Figure 13.2.5c – Step 2Figure 13.2.5c – Step 2

2. Scroll down until you see ‘genomic regions, transcripts, and products’. Click on FASTA. Copy the entire sequence, including the heading, onto your notepad (plain text format).

Figure 13.2.5d – Step 3Figure 13.2.5d – Step 3

3. Repeat steps 1 and 2 for another protein. You will now have two sequences on your notepad.

Part B: Align sequences

Go to: www.ebi.ac.uk.

Figure 13.2.5e – Clustal OmegaFigure 13.2.5e – Clustal Omega

1. Paste or upload the sequences to be compared into the text box. Click on 'Submit'.

Figure 13.2.5f – ResultsFigure 13.2.5f – Results

The image above shows the results of a nucleotide sequence alignment for the gene coding for the hemoglobin alpha subunit in Homo sapiens (human) and Mus musculus (mouse). Asterisks indicate similarities in the sequence. 

Try it!

Use the NCBI database and Clustal software to compare the sequences in related proteins, such as hemoglobin and myoglobin.

Constructing phylograms and cladograms using DNA sequences

  • Clustal Omega is more appropriately used to align multiple sequences at the same time.
  • The results of a multiple sequence alignment can be displayed as a cladogram (equal branch lengths), or a phylogenetic tree, as shown below.

Figure 13.2.5g – Multiple sequence alignmentFigure 13.2.5g – Multiple sequence alignment
A phylogenetic tree showing evolutionary relationships of hemoglobin genes from ten animal species (generated using Clustal Omega).

Try it!

Collect at least eight different sequences from the NCBI database, perform a multiple sequence alignment, and generate a phylogenetic tree for the protein or gene of your choice.

Figure 13.2.5h – KaryogramFigure 13.2.5h – Karyogram
Chromosome 21 is the smallest human chromosome.

Food for thought

Can you determine the genetic similarity of human and mouse hemoglobin from Figure 13.2.5f?

Figure 13.2.5i – Multiple sequence alignmentFigure 13.2.5i – Multiple sequence alignment
Multiple sequence alignments are easier to interpret when nucleotides are colour-coded.

Figure 15.2.5j – Data miningFigure 15.2.5j – Data mining


How reliable are knowledge claims justified by reference to data that the researcher did not acquire? How can one be sure of sources and methods developed for different purposes by different researchers? What are the implications for bioinformatics and the future of data mining?

Course links