Alignment

Sequence Alignment

What is sequence alignment?

A sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences.

How sequence is aligned?

Human knowledge is applied in constructing algorithms to produce high-quality sequence alignments, and occasionally in adjusting the final results to reflect patterns that are difficult to represent algorithmically (especially in the case of nucleotide sequences). Computational approaches to sequence alignment generally fall into two categories: global alignments and local alignments. Calculating a global alignment is a form of global optimization that "forces" the alignment to span the entire length of all query sequences. By contrast, local alignments identify regions of similarity within long sequences that are often widely divergent overall. Local alignments are often preferable, but can be more difficult to calculate because of the additional challenge of identifying the regions of similarity. A variety of computational algorithms have been applied to the sequence alignment problem. These include slow but formally correct methods like dynamic programming. These also include efficient, heuristic algorithms or probabilistic methods designed for large-scale database searching, that do not guarantee to find best matches. Hybrid methods, known as semi-global or "glocal" (short for global-local) methods, attempt to find the best possible alignment that includes the start and end of one or the other sequence. This can be especially useful when the downstream part of one sequence overlaps with the upstream part of the other sequence. In this case, neither global nor local alignment is entirely appropriate: a global alignment would attempt to force the alignment to extend beyond the region of overlap, while a local alignment might not fully cover the region of overlap.

How sequence alignment be used?

Phylogenetics and sequence alignment are closely related fields due to the shared necessity of evaluating sequence relatedness. Sequence alignment can be used for construction and interpretation of phylogenetic trees, which are used to classify the evolutionary relationships between homologous genes represented in the genomes of divergent species. Roughly speaking, high sequence identity suggests that the sequences in question have a comparatively young most recent common ancestor, while low identity suggests that the divergence is more ancient.

References

https://en.wikipedia.org/wiki/Sequence_alignment
Valery, O. P., Mikhail A, R., & Vladimir G. T. (2011). Comparative analysis of the quality of a global algorithm and a local algorithm for alignment of two sequences. Algorithms Mol Biol. doi: 10.1186/1748-7188-6-25.
Philippe O., & Olivier B. (2010). Where Does the Alignment Score Distribution Shape Come from? Evolutionary Bioinformatics. 6: 159–187.

Biolecture.org β

Alignment

Biolecture.org ^β