6) Producing statistics of a multi-sequence FASTA file: sequence number, average seq length, GC content, AT content

From Biolecture.org

Here I created sub function, which opens the file and removes a header, also combine all lines; Further, each nucleatide in fasta file placed to arrays, then counted their content:  

#!/usr/bin/perl
use strict;
use warnings;

 my $file = $ARGV[0];
 open (FILE, "$file") or die "Can't open the file: $_\n";

sub readingfa
{
   my $string = "";
    while(<FILE>){
        chomp($_);
        if($_ =~ /^>/){ next; }
        else { $string .= $_; }
    }
    return($string); }
 my $seq = &readingfa();
 my $C = 0;  my $G = 0;
 my $A = 0;  my $T = 0;  my $N = 0;

my $length = length $seq;

 my @nucleatides = split "",$seq;
 foreach my $nuc (@nucleatides) {
   if ($nuc eq "C"){
  $C  = $C+1; }
 elsif ($nuc eq "G"){
  $G = $G+1; }
  elsif ($nuc eq "A"){
 $A = $A+1; }
 elsif ($nuc eq "T"){
 $T = $T+1;}
 elsif ($nuc eq "N") { $N = $N+1;
}}

 my $GC_content = (($C+$G)/$length)*100;
  my $AT_content = (($A+$T)/$length)*100;

 print "Fasta file statistics: \n";
 print "Sequence length = $length\n";
 print "Number of A: $A\n";
 print "Number of C: $C\n";
 print "Number of T: $T\n";
 print "Number of G: $G\n";
 print "Number of N: $N\n";
 print "GC content is = $GC_content\n";
 print "AT content is = $AT_content\n";

----------------------------------------------------------------------------------------------

Perl script output will be: 

Fasta file statistics: 
Sequence length = 156040895
Number of A: 46754807
Number of C: 30523780
Number of T: 46916701
Number of G: 30697741
Number of N: 1147861
GC content is = 39.2342795777991
AT content is = 60.0301017242948