Sunday, January 15, 2012

Basic Perl

Perl is a wonderful language.
It gives me a certain satisfaction to write obfuscated code.
When I say it is obfuscated, I compare it to Objective-C.
Perl looks neat and has a ton of great shortcuts.
Here is a nice little example that I took few hours to write.
It uses the LWP package to request pages from the Expasy molecular weight tool.

We have a hard-coded array of certain protein sequences for which we want to get the isoelectric point (pI) and the molecular weight (Mw).
Both are relatively simple calculations, but the purpose of this code snippet is to demonstrate internet interfacing with LWP. We can use the same principles to interface with Google, for example.
For each sequence in the list, we form a new request to the Expasy pI tool.
We then use a simple regular expression (since we know exactly how the output will look like) to get the numbers we are looking for.
After parsing the regular expression, we put the numbers in a hash and then add that hash to list.
The list is our result table.
When we are done, we just print out the result table.
#!/usr/bin/perl -w
use strict;
#This is the package we use for getting networked files by url
use LWP::Simple; 

#hard code the list of sequences
my @sequences = ('MKWVTFISLLFLFSSAYS', 'MWVTFISLL', 'MFISLLFLFSSAYS');
#initialize script-wide data array. We will store our result table here.
my @data;
#for each sequence
foreach (@sequences) {
    #form a new request using string concatenation
    #in this case, $_ gives us the current element (i.e. sequences[i])
    my $request = 'http://ca.expasy.org/cgi-bin/pi_tool?protein='.$_.'&#amp;resolution=monoisotopic';
    #try to fetch the page.
    #Isn't it nice to have the error handler in the same line?
    my $mw_page = get $request or die "Unable to fetch $request\n";
    #A few notes on the regex:
    #It starts with a simple text match. The digits are grouped in to brackets
    #The first group is the pI, accessed as $1
    #One or more digits, followed by a full stop, followed by one or two digits
    #The second group is the same, addressed s $2
    if ($mw_page =~ /Theoretical pI\/Mw: (\d+\.\d{1,2}) \/ (\d+\.\d{1,2})/) { 
        #create the hash
        #notice that $_ still holds the sequence
        my %entry = (sequence =&#gt; $_, pI =&#gt; $1, Mw =&#gt; $2);
        #use reference to the entry in the array
        push @data, \%entry;
    } else {
        print "No match found\n";
 }
}
#for each reference to a hash in data
foreach my $hash_ref (@data) {
    #get key,value pairs
    #notice the dereferencing %$
    while ((my $key, my $value) = each %$hash_ref) {
        print "$key: $value\n";
    }
}