BuKyung Align two protein sequences using a dynamic programming method in Perl
Back to Baik BuKyung
Source code:
use strict;
my $numberofseq=0;
while(<>){
if($_=~ />/){
$sequence[$numberofseq]{seqname}=$_;
$sequence[$numberofseq]{seqname}=~ s/\n//;
}
else{
$sequence[$numberofseq]{seq}=$_;
$sequence[$numberofseq]{seq}=~ s/\n//;
$numberofseq++;
}
my $seq_2=$sequence[1]{seq};
my $mismatch = -1;
$matrix[0][0]{score} = 0;
$matrix[0][$j]{score} = $indel * $j;
$matrix[0][$j]{direction} = "left";
}
for (my $i = 1; $i <= length($seq_2); $i++) {
$matrix[$i][0]{score} = $indel * $i;
$matrix[$i][0]{direction} = "top";
for(my $i = 1; $i <= length($seq_2); $i++) {
for(my $j = 1; $j <= length($seq_1); $j++) {
my $char1 = substr($seq_1, $j-1, 1);
my $char2 = substr($seq_2, $i-1, 1);
if ($char1 eq $char2) {
$diagonal = $matrix[$i-1][$j-1]{score} + $match;
}
else {
$diagonal = $matrix[$i-1][$j-1]{score} + $mismatch;
$top = $matrix[$i-1][$j]{score} + $indel;
if ($diagonal >= $top) {
if ($diagonal >= $left) {
$matrix[$i][$j]{score} = $diagonal;
$matrix[$i][$j]{direction} = "diagonal";
}
else {
$matrix[$i][$j]{score} = $left;
$matrix[$i][$j]{direction} = "left";
}
} else {
if ($top >= $left) {
$matrix[$i][$j]{score} = $top;
$matrix[$i][$j]{direction} = "top";
}
else {
$matrix[$i][$j]{score} = $left;
$matrix[$i][$j]{direction} = "left";
}
}
}
my $seq2_align = "";
my $j = length($seq_1);
while (!($matrix[$i][$j]{direction} eq "dead_end")) {
$seq1_align = $seq1_align . substr($seq_1, $j-1, 1);
$seq2_align = $seq2_align . substr($seq_2, $i-1, 1);
$i--;
$j--;
}
elsif ($matrix[$i][$j]{direction} eq "left") {
$seq1_align = $seq1_align . substr($seq_1, $j-1, 1);
$seq2_align = $seq2_align . "-";
$j--;
}
elsif ($matrix[$i][$j]{direction} eq "top") {
$seq1_align = $seq1_align . "-";
$seq2_align = $seq2_align . substr($seq_2, $i-1, 1);
$i--;
}
$seq2_align = reverse $seq2_align;
Algorithm
I used Needleman-Wunsch Algorithm.
I used Needleman-Wunsch Algorithm, I assigned the score with match, mismatch and gap.
Picture reference: https://commons.wikimedia.org/wiki/File:Needleman-Wunsch_pairwise_sequence_alignment.png#/media/File:Needleman-Wunsch_pairwise_sequence_alignment.png
Algorithm explanation: https://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm
Result
After the 9.pl is executed.
The original content in the outer.fasta file is
>0
FNFSSDMRRVKRERMLKMVSKPLKRRCYHLSGCVALCAIPEELDRICKGKIPHDDHCRPVARFHSVECYCGCNQNQQYDAGGVNMHRSLMQAEGHREPNH
>1
LADIDFNVHCQEWDENRAGEWVHQDPKVNWRFILCIRLFGMEGFKNQAHLDMVEKLYIYPLPNAWKWYKALCHYAKHAMCLQDESYECKYQRPWIMDKVC
>2
QWCRYLRMQREGQRFGVYRRVCIVICSVFFEWADQMYMSHRCVKAWRWWECQRFDIVIRSPRLNRLFCNHEGAHAAKWVWPCLMHRLLDFNCLQNSMRCR
>3
PKCAPQRAHLPYGARHHWGDSLHMGEHAVSCQRKWDMFHGIRHFPKSDAGMQWFNEKERPKKWCAVFKYELFVPLMRPHAQGQYSRELREVAAHRLCSCP
>4
RVQDLFNEGEIAEVWCNQYWESFSKARVGCMGISYPRSMFWMCFRPGFCYNQHEQIVQPALLLQVNGYRIQNDEMNEVRIMDPPIRYDMWPCVPLKDDLS
This file is generated by the BuKyung Randomly generate five 100 AA long protein sequences and store them in a FASTA file assignment.
I used the 0 and 1 sequence for the alignment.
The printed result is