Difference between revisions of "6) Producing statistics of a multi-sequence FASTA file: sequence number, average seq length, GC content, AT content"
imported>S (Created page with "<p><strong>Here I created sub function, which opens the file and removes a header, also combine all lines; Further, each nucleatide in fasta file placed to arrays, then coun...") |
imported>S |
||
Line 49: | Line 49: | ||
print "AT content is = $AT_content\n";</p> | print "AT content is = $AT_content\n";</p> | ||
− | <p>-----------------------------------------------------</p> | + | <p>----------------------------------------------------------------------------------------------</p> |
− | <p>Perl script output will be: </p> | + | <p><strong>Perl script output will be: </strong></p> |
<p>Fasta file statistics: <br /> | <p>Fasta file statistics: <br /> |
Latest revision as of 21:37, 11 June 2017
Here I created sub function, which opens the file and removes a header, also combine all lines; Further, each nucleatide in fasta file placed to arrays, then counted their content:
#!/usr/bin/perl
use strict;
use warnings;
my $file = $ARGV[0];
open (FILE, "$file") or die "Can't open the file: $_\n";
sub readingfa
{
my $string = "";
while(<FILE>){
chomp($_);
if($_ =~ /^>/){ next; }
else { $string .= $_; }
}
return($string); }
my $seq = &readingfa();
my $C = 0; my $G = 0;
my $A = 0; my $T = 0; my $N = 0;
my $length = length $seq;
my @nucleatides = split "",$seq;
foreach my $nuc (@nucleatides) {
if ($nuc eq "C"){
$C = $C+1; }
elsif ($nuc eq "G"){
$G = $G+1; }
elsif ($nuc eq "A"){
$A = $A+1; }
elsif ($nuc eq "T"){
$T = $T+1;}
elsif ($nuc eq "N") { $N = $N+1;
}}
my $GC_content = (($C+$G)/$length)*100;
my $AT_content = (($A+$T)/$length)*100;
print "Fasta file statistics: \n";
print "Sequence length = $length\n";
print "Number of A: $A\n";
print "Number of C: $C\n";
print "Number of T: $T\n";
print "Number of G: $G\n";
print "Number of N: $N\n";
print "GC content is = $GC_content\n";
print "AT content is = $AT_content\n";
----------------------------------------------------------------------------------------------
Perl script output will be:
Fasta file statistics:
Sequence length = 156040895
Number of A: 46754807
Number of C: 30523780
Number of T: 46916701
Number of G: 30697741
Number of N: 1147861
GC content is = 39.2342795777991
AT content is = 60.0301017242948