Changes
From Biolecture.org
no edit summary
<p>Back to [[Baik BuKyung]]</p>
<hr />
<p><span style="font-size:24px">Source code:</span></p>
<hr />
<div>
<div>#!/usr/bin/perl<br />
use strict;<br />
use warnings;<br />
open FH, ">", "outer.fasta" or die "$!\n";<br />
my $numberofseq=0;<br />
my @matrix;<br />
while(<>){<br />
if($_=~ />/){<br />
$matrix[$numberofseq]{seqname}=$_;<br />
$matrix[$numberofseq]{seqname}=~ s/\n//;<br />
<br />
}<br />
else{<br />
$matrix[$numberofseq]{seq}=$_;<br />
$matrix[$numberofseq]{seq}=~ s/\n//;<br />
$numberofseq++;<br />
}<br />
}</div>
<div> </div>
<div>for(my $i=0;$i<$numberofseq;$i++){<br />
my $count=0;<br />
$matrix[$i]{seqlen}=length($matrix[$i]{seq});<br />
for(my $j=0;$j<$matrix[$i]{seqlen};$j++){<br />
my $seq_char=substr($matrix[$i]{seq},$j,1);<br />
if($seq_char=~/[GC]/){<br />
$count++;<br />
}<br />
$matrix[$i]{GC}=$count;<br />
}<br />
}</div>
<div> </div>
<div>my $total_seqlen=0;<br />
my $total_GC=0;<br />
for(my $i=0;$i<$numberofseq;$i++){<br />
print FH ($matrix[$i]{seqname},"\n",$matrix[$i]{seq},"\n GC content is:",$matrix[$i]{GC},"\n");<br />
$total_seqlen=$total_seqlen+$matrix[$i]{seqlen};<br />
$total_GC=$total_GC+$matrix[$i]{GC};<br />
}<br />
print FH ("Average sequence length is :",$total_seqlen/$numberofseq,"\n GC contents:",$total_GC/$total_seqlen,"\n AT contents:",1-($total_GC/$total_seqlen),"\n")</div>
</div>
<div>
<hr />
<p> </p>
<p><img alt="" src="/ckfinder/userfiles/images/%EC%BA%A1%EC%B2%9818.PNG" style="height:631px; width:1162px" /></p>
<p> </p>
</div>
<div>
<hr />
<p> </p>
<p><span style="font-size:24px">Result</span></p>
<p><img alt="" src="/ckfinder/userfiles/images/%EC%BA%A1%EC%B2%9819.PNG" style="height:20px; width:406px" /></p>
<p> </p>
<p><span style="font-size:16px">After the 6.pl is executed with the 5_100-length_Seq.fasta file, the outer.fasta file is generated.</span></p>
<p><span style="font-size:16px">The original content in the tert_Human.fasta file contains 5 fasta sequences with each length 100. The editted version of [[BuKyung Randomly generate five 100 AA long protein sequences and store them in a FASTA file]] made this file.</span></p>
<p> </p>
<p><em>>0<br />
ACCACTACTAAGCGCATGAACGACTGTTAGGTTTCCGATGGCTGCTTGCGTTCCGTGTTCCAGCTGACTGGGCTGAACTATTTGTAATGTTGGTTGCACT<br />
>1<br />
CAGGTACACGGACTGTTTGGTTTGCCCAATTAATTGGCGGGTCGTAAACCGGTTTTTCGTTGGGCGCGGAGTTGTCGTAAACGGTCGGTATTAACTACCT<br />
>2<br />
ATATTCTGTTCGAAGGCGAGGCCTTAATAAACGGGCTCACACTATACGTTTCTAGCGTGCCAGTACGCGTATGCCCTGAGCAGCATCTTGAATAGTCCTT<br />
>3<br />
CACGTCTTGAGGCATGCTCACATAACTTGGGATTGATACAATCGGGGGACGGTAGCGGGGCTAGTGGGCATCGTCGGCGGTCTACGAGCAAAAGTATCAG<br />
>4<br />
CAGGACGTGAACCGAAAGCTGCACACCTATACTATCGTAGTATACCACCGTTCCGTAAATCCATCGCTGATCCTGCCATGAAGGGCTAAGTACGCATGAG</em><br />
</p>
<p> </p>
<p> </p>
<p><span style="font-size:16px">The content of outer.fasta file is</span></p>
<p> </p>
<div>
<div><em>>0<br />
ACCACTACTAAGCGCATGAACGACTGTTAGGTTTCCGATGGCTGCTTGCGTTCCGTGTTCCAGCTGACTGGGCTGAACTATTTGTAATGTTGGTTGCACT<br />
GC content is:0.49<br />
>1<br />
CAGGTACACGGACTGTTTGGTTTGCCCAATTAATTGGCGGGTCGTAAACCGGTTTTTCGTTGGGCGCGGAGTTGTCGTAAACGGTCGGTATTAACTACCT<br />
GC content is:0.5<br />
>2<br />
ATATTCTGTTCGAAGGCGAGGCCTTAATAAACGGGCTCACACTATACGTTTCTAGCGTGCCAGTACGCGTATGCCCTGAGCAGCATCTTGAATAGTCCTT<br />
GC content is:0.48<br />
>3<br />
CACGTCTTGAGGCATGCTCACATAACTTGGGATTGATACAATCGGGGGACGGTAGCGGGGCTAGTGGGCATCGTCGGCGGTCTACGAGCAAAAGTATCAG<br />
GC content is:0.55<br />
>4<br />
CAGGACGTGAACCGAAAGCTGCACACCTATACTATCGTAGTATACCACCGTTCCGTAAATCCATCGCTGATCCTGCCATGAAGGGCTAAGTACGCATGAG<br />
GC content is:0.5</em></div>
<div> </div>
<div><br />
<em>Average sequence length is :100<br />
GC contents:0.504<br />
AT contents:0.496</em></div>
</div>
<div> </div>
<div> </div>
<div><span style="font-size:16px">I just add GC content of each sequence to the end of each sequence. At the end of the file, Average sequence length, GC contents and AT contents are printed out.</span></div>
</div>