IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

Perl scripts for FASTA/Junk DNA

varsha raja rnavarsha at yahoo.com
Mon Oct 21 22:04:05 EST 2002

Dear Arabidopsis Group Users

S. K.Patel and Varsha Raja*

Some new Programmes for FASTA  conversions and to
remove junk sequences from a group of genes/gene

1. Pick up one  raw protein sequence(without fasta
format) save in one file.txt and  use the following
script file   as  commandline line argument.  e.g.
perl script file name  of the  following  script is
"fastaformat.pl "    and  sequence file is
"RawSeq.txt"(not fasta format file) then    type
command at the prompt  as following

     $ fastaformat.pl RawSeq.txt

     the output will be in  fasta format

     use this fasta format in NCBI for  Blasting

   2.For checking validity of this fasta format, first
take one sample sequence from NCBI  change into fast
format with the help of the program already provided
by Ncbi  AND save blast result in  one file .  And
then use the following script to convert the same raw
sequence  to convert into fasta  fasta format. Use
this  fasta formatted sequencce to  get  the blast
result. Now  whatever  blast result u   got , save in
other file  and  Compare both  blast results file. The
result will be same

   Following script  converts raw  protein sequence
into Fasta format



system clear;

  open(FFF,$ARGV[0])  || warn  " can not open ****
first file";




    $Genome=~s/ //g;

    print $Genome,"\n";




second script  to  remove the junk characters in
nucleotide  sequence

e.g. your sequence  get some junk characters  by
copying from computer or by

mistake u inserted some characaters other than
"ATGC"(four basic letters),the

following script  removes all unnecessary
characters(means filters the

sequnece  and final output is having only  "ATGC"
(nucleotide) contents

How to test the script

Make one text file say testing.txt having contents
following which contains junk characters

Contents of the file name  testing.txt as following




system clear;

  open(FFF,$ARGV[0])  || warn  " can not open ****
first file";




      { $sub=substr($Genome,$i,1);

        if(($sub eq  A)||($sub eq  T)||($sub eq
G)||($sub eq  C))

        { print $sub; }




Run the perl script at  promt

$ perl  filename.pl  testing.pl

Note: the output contains  only  ATGC




Do you Yahoo!?
New DSL Internet Access from SBC & Yahoo!

More information about the Arab-gen mailing list

Send comments to us at biosci-help [At] net.bio.net