About Sequence Alignment

From Biolecture.org

[Sequence alignment]

 

Definition

It is a method of arranging the biological sequences (DNA, RNA, or protein) and determining the relationships of given molecules that can be interpreted into the information about evolution. Through inserting some gaps in sequence matrix composed of nucleotides or amino acids, the identities or similar characteristics between each string is analyzed. In detail, the point mutation, gaps generated by insertion or deletion of several nucleotides (or amino acids) can be easily detected. And the conservation rate of strings also can be calculated.

 

Methods

If the target sequence, which is very short like 10~20 nucleotide long, is given, it could be aligned manually. But the information needed in real medical science is too lengthy to be analyzed by only human effort. In that sense, there are hundreds of softwares developed by many companies that can conduct sequence alignment of all DNA, RNA, and proteins in these days. As all those methods invented to compare the potential sequence alignments, each method has its own formulas to determine the values for each alignment. The formulas are known as objective functions which mean that the data range from simple to complex values. And those are generally divided into two categories, global and local alignments. Global alignments based on the idea that the entire sequences are homologous, and the given sequences are to be aligned through whole sites within it. But global methods have drawbacks that those are hard to apply on large and long sequences. In the case of long sequences, the homologous sequences are just exist in a form of little motif. Thus, to resolve the problems, the local alignments are developed. Through local alignments, the sequences can be aligned separately without reference patterns and ignore the large portion of unaligned sites.

In those two categories, there are some representative alignments methods – Pairwise alignments, Multiple sequence alignments, Structural alignments.

 

Multiple alignments

In general, most alignment methods conduct the determination of similarities or some other factors between two sequences. But the Multiple alignments, like its name, use sequences more than two for analysis. In multiple rows of sequences, the consensus sequence is shown in the last row. Thus it is easy to find out what the conserved sequence is among hundreds of sequences. The information about conserved sequences can be used in analyzing active sites of enzymes or establishing evolutionary relationships.

But this approach requires thousands of data, it is indicated as a drawback that the inefficiency of computational intractability. Thus many alternative approaches for this method suggested such as weighted averaging.

 

Reference

1. http://www.ucpress.edu/content/chapters/10874.ch01.pdf

2. https://en.wikipedia.org/wiki/Sequence_alignment