BuKyung Align two protein sequences using a dynamic programming method in Perl

From Biolecture.org

Back to Baik BuKyung


Source code:

#!/usr/bin/perl

 use strict;

 use warnings;
 
 my @sequence;

 my $numberofseq=0;
 while(<>){
 if($_=~ />/){
  $sequence[$numberofseq]{seqname}=$_;
  $sequence[$numberofseq]{seqname}=~ s/\n//;
  
}
 else{
  $sequence[$numberofseq]{seq}=$_;
  $sequence[$numberofseq]{seq}=~ s/\n//;
  $numberofseq++;
}

}
 
my $seq_1=$sequence[0]{seq};
my $seq_2=$sequence[1]{seq};
 
 
 
my $match =  1;

my $mismatch = -1;

my $indel = -1;
 
my @matrix;

$matrix[0][0]{score}   = 0;

$matrix[0][0]{direction} = "dead_end";
 
for(my $j = 1; $j <= length($seq_1); $j++) {

    $matrix[0][$j]{score} = $indel * $j;
    $matrix[0][$j]{direction} = "left";
}
for (my $i = 1; $i <= length($seq_2); $i++) {
    $matrix[$i][0]{score}   = $indel * $i;
    $matrix[$i][0]{direction} = "top";

}
 

for(my $i = 1; $i <= length($seq_2); $i++) {
    for(my $j = 1; $j <= length($seq_1); $j++) {

        my ($diagonal, $left, $top);
 
        # calculate match score

        my $char1 = substr($seq_1, $j-1, 1);
        my $char2 = substr($seq_2, $i-1, 1);                           
        if ($char1 eq $char2) {
            $diagonal = $matrix[$i-1][$j-1]{score} + $match;
        }
        else {
            $diagonal = $matrix[$i-1][$j-1]{score} + $mismatch;

        }
 
        # calculate gap scores

        $top = $matrix[$i-1][$j]{score} + $indel;

        $left = $matrix[$i][$j-1]{score} + $indel;
 
        # choose best score

        if ($diagonal >= $top) {
            if ($diagonal >= $left) {
                $matrix[$i][$j]{score} = $diagonal;
                $matrix[$i][$j]{direction} = "diagonal";
            }
        else {
                $matrix[$i][$j]{score} = $left;
                $matrix[$i][$j]{direction} = "left";
            }
        } else {
            if ($top >= $left) {
                $matrix[$i][$j]{score}   = $top;
                $matrix[$i][$j]{direction} = "top";
            }
            else {
                $matrix[$i][$j]{score}   = $left;
                $matrix[$i][$j]{direction} = "left";
            }
        }
    }

}
 
 
 
my $seq1_align = "";
my $seq2_align = "";
 
my $i = length($seq_2);
my $j = length($seq_1);
 

while (!($matrix[$i][$j]{direction} eq "dead_end")) {
 
    if ($matrix[$i][$j]{direction} eq "diagonal") {

        $seq1_align = $seq1_align . substr($seq_1, $j-1, 1);
        $seq2_align = $seq2_align . substr($seq_2, $i-1, 1);
        $i--;
        $j--;
    }
    elsif ($matrix[$i][$j]{direction} eq "left") {
        $seq1_align = $seq1_align . substr($seq_1, $j-1, 1);
        $seq2_align = $seq2_align . "-";
        $j--;
    }
    elsif ($matrix[$i][$j]{direction} eq "top") {
        $seq1_align = $seq1_align . "-";
        $seq2_align = $seq2_align . substr($seq_2, $i-1, 1);
        $i--;
    }   

}
 
$seq1_align = reverse $seq1_align;
$seq2_align = reverse $seq2_align;
 
print ("\n",$seq1_align,"\n", $seq2_align);

Algorithm

I used Needleman-Wunsch Algorithm.

I used Needleman-Wunsch Algorithm, I assigned the score with match, mismatch and gap.

Picture reference: https://commons.wikimedia.org/wiki/File:Needleman-Wunsch_pairwise_sequence_alignment.png#/media/File:Needleman-Wunsch_pairwise_sequence_alignment.png

Algorithm explanation: https://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm

 


Result

After the 9.pl is executed.

The original content in the outer.fasta file is

>0
FNFSSDMRRVKRERMLKMVSKPLKRRCYHLSGCVALCAIPEELDRICKGKIPHDDHCRPVARFHSVECYCGCNQNQQYDAGGVNMHRSLMQAEGHREPNH
>1
LADIDFNVHCQEWDENRAGEWVHQDPKVNWRFILCIRLFGMEGFKNQAHLDMVEKLYIYPLPNAWKWYKALCHYAKHAMCLQDESYECKYQRPWIMDKVC
>2
QWCRYLRMQREGQRFGVYRRVCIVICSVFFEWADQMYMSHRCVKAWRWWECQRFDIVIRSPRLNRLFCNHEGAHAAKWVWPCLMHRLLDFNCLQNSMRCR
>3
PKCAPQRAHLPYGARHHWGDSLHMGEHAVSCQRKWDMFHGIRHFPKSDAGMQWFNEKERPKKWCAVFKYELFVPLMRPHAQGQYSRELREVAAHRLCSCP
>4
RVQDLFNEGEIAEVWCNQYWESFSKARVGCMGISYPRSMFWMCFRPGFCYNQHEQIVQPALLLQVNGYRIQNDEMNEVRIMDPPIRYDMWPCVPLKDDLS

 

This file is generated by the BuKyung Randomly generate five 100 AA long protein sequences and store them in a FASTA file assignment.

I used the 0 and 1 sequence for the alignment.

 

The printed result is