Monday, April 30, 2012

Building vim with python support

Here is a walk-through of how to build vim with python2 support on a bare Arch Linux installation. This assumes that you will have no problems with conflicting packages or anything of that sort.
1) Make sure the build tools are present:
sudo pacman -S base-devel
This will install all the necessary build tools (configure, make, gcc and all those beautiful things). If you are not certain you have this, run
pacman -Ss base-devel
. This should print out the packages in the base-devel bundle. Chances are you will have
[installed]
with all of them.
2) Get the python2.7
There are a number of reasons I am using 2.7 right now, mostly amounting to scientific package support (e.g. ipython, numpy and pandas). Interestingly, Arch has python3 as the default python distribution (one more reason to love Arch, isn't it?).
sudo pacman -S python2
Now you would need to symlink the
python2
executable as
python
, both for convenience and so that the configure on vim build sees the distribution
ln -s /usr/bin/python2 /usr/bin/python
3) Get the vim source
wget ftp://ftp.vim.org/pub/vim/unix/vim-7.3.tar.bz2
4) Go to the source dict and configure everything
cd vim73/src/
./configure --with-features=huge --enable-pythoninterp --with-python-config-dir=/usr/lib/python2.7/config
5)Move the executable to the /usr/bin:
mv ./vim /usr/bin
6) Check if it all works
vim -g 
:python print 'hello vim'
That's all folks!

Sunday, January 15, 2012

Basic Perl

Perl is a wonderful language.
It gives me a certain satisfaction to write obfuscated code.
When I say it is obfuscated, I compare it to Objective-C.
Perl looks neat and has a ton of great shortcuts.
Here is a nice little example that I took few hours to write.
It uses the LWP package to request pages from the Expasy molecular weight tool.

We have a hard-coded array of certain protein sequences for which we want to get the isoelectric point (pI) and the molecular weight (Mw).
Both are relatively simple calculations, but the purpose of this code snippet is to demonstrate internet interfacing with LWP. We can use the same principles to interface with Google, for example.
For each sequence in the list, we form a new request to the Expasy pI tool.
We then use a simple regular expression (since we know exactly how the output will look like) to get the numbers we are looking for.
After parsing the regular expression, we put the numbers in a hash and then add that hash to list.
The list is our result table.
When we are done, we just print out the result table.
#!/usr/bin/perl -w
use strict;
#This is the package we use for getting networked files by url
use LWP::Simple; 

#hard code the list of sequences
my @sequences = ('MKWVTFISLLFLFSSAYS', 'MWVTFISLL', 'MFISLLFLFSSAYS');
#initialize script-wide data array. We will store our result table here.
my @data;
#for each sequence
foreach (@sequences) {
    #form a new request using string concatenation
    #in this case, $_ gives us the current element (i.e. sequences[i])
    my $request = 'http://ca.expasy.org/cgi-bin/pi_tool?protein='.$_.'&#amp;resolution=monoisotopic';
    #try to fetch the page.
    #Isn't it nice to have the error handler in the same line?
    my $mw_page = get $request or die "Unable to fetch $request\n";
    #A few notes on the regex:
    #It starts with a simple text match. The digits are grouped in to brackets
    #The first group is the pI, accessed as $1
    #One or more digits, followed by a full stop, followed by one or two digits
    #The second group is the same, addressed s $2
    if ($mw_page =~ /Theoretical pI\/Mw: (\d+\.\d{1,2}) \/ (\d+\.\d{1,2})/) { 
        #create the hash
        #notice that $_ still holds the sequence
        my %entry = (sequence =&#gt; $_, pI =&#gt; $1, Mw =&#gt; $2);
        #use reference to the entry in the array
        push @data, \%entry;
    } else {
        print "No match found\n";
 }
}
#for each reference to a hash in data
foreach my $hash_ref (@data) {
    #get key,value pairs
    #notice the dereferencing %$
    while ((my $key, my $value) = each %$hash_ref) {
        print "$key: $value\n";
    }
}