BuKyung Create a flat text file database of protein sequences with hash function in Perl

From Biolecture.org

Back to Baik BuKyung


Source code:

#!/usr/bin/perl
 use strict;
 use warnings;
 open FH, ">", "outerl.txt" or die "$!\n";
 my %sequence=();
 my $seq_name;
 my $seq;
 while(<>){
 if($_=~ />/){
  $seq_name=$_;
  $seq_name=~ s/\n//;
}
 else{
  $seq=$_;
  $seq=~ s/\n//;
}
 $sequence{$seq_name}=$seq;
}

 

foreach my $key (sort keys %sequence) {
    print FH $key, " : ", $sequence{$key}, "   \n";
}



Result

After execution of 10.pl with outer.fasta, the outerl.fasta file is generated.

The content of outer.fasta is

>0
LIEYMVYQVHECCMKNIKKSQVSARMRARGHMVQLYYEDWEPIISDQRNSAANRSDDRVIESQSKQNVKHSNWEQCMCWFKILINMWLGQMREPPIYEDI
>1
KHGGRDNLQSMPSLMNDNERRSMRSQRDWHGFWQVLRFMPFHGNNNMHQDCNSHSDQGFIRMDHCKHHRVNGLVISRRRPDHPNQFISWRYGDDSIQFYQ
>2
YWCYISQDNRAERASYYKEVQPNPPNGNRGFPWEPFDQCGVALNAMWKLCIHVNGNRPQNPGQGPYLKHMRVAVDELRSDPAVYFKEDKVDCRHEKFGDK
>3
KAHIQRVRQNNKRSIWGCKRAHGCQEWYNGMFWNHKCIWCREGGEESRPHNNEQIRPDMSGQRKAISPELAPLEGWMEYQCFRKDPKANEMRVNLEMAHM
>4
SRVRVCFKPMYGMIKHHSVHQECGIKDPSYGWLGRPEASHICIWGQHGNNINFMYGKIYRQSYRIPCEDKCPPAPAPLVIQEVWLAPAHRNNKLHKRRGR

 

generated by the BuKyung Randomly generate five 100 AA long protein sequences and store them in a FASTA file assignment program.

 

The contents of the outerl.txt is

>0 : LIEYMVYQVHECCMKNIKKSQVSARMRARGHMVQLYYEDWEPIISDQRNSAANRSDDRVIESQSKQNVKHSNWEQCMCWFKILINMWLGQMREPPIYEDI  
>1 : KHGGRDNLQSMPSLMNDNERRSMRSQRDWHGFWQVLRFMPFHGNNNMHQDCNSHSDQGFIRMDHCKHHRVNGLVISRRRPDHPNQFISWRYGDDSIQFYQ  
>2 : YWCYISQDNRAERASYYKEVQPNPPNGNRGFPWEPFDQCGVALNAMWKLCIHVNGNRPQNPGQGPYLKHMRVAVDELRSDPAVYFKEDKVDCRHEKFGDK  
>3 : KAHIQRVRQNNKRSIWGCKRAHGCQEWYNGMFWNHKCIWCREGGEESRPHNNEQIRPDMSGQRKAISPELAPLEGWMEYQCFRKDPKANEMRVNLEMAHM  
>4 : SRVRVCFKPMYGMIKHHSVHQECGIKDPSYGWLGRPEASHICIWGQHGNNINFMYGKIYRQSYRIPCEDKCPPAPAPLVIQEVWLAPAHRNNKLHKRRGR