Monday, April 30, 2012

Building vim with python support

Here is a walk-through of how to build vim with python2 support on a bare Arch Linux installation. This assumes that you will have no problems with conflicting packages or anything of that sort.
1) Make sure the build tools are present:
sudo pacman -S base-devel
This will install all the necessary build tools (configure, make, gcc and all those beautiful things). If you are not certain you have this, run
pacman -Ss base-devel
. This should print out the packages in the base-devel bundle. Chances are you will have
[installed]
with all of them.
2) Get the python2.7
There are a number of reasons I am using 2.7 right now, mostly amounting to scientific package support (e.g. ipython, numpy and pandas). Interestingly, Arch has python3 as the default python distribution (one more reason to love Arch, isn't it?).
sudo pacman -S python2
Now you would need to symlink the
python2
executable as
python
, both for convenience and so that the configure on vim build sees the distribution
ln -s /usr/bin/python2 /usr/bin/python
3) Get the vim source
wget ftp://ftp.vim.org/pub/vim/unix/vim-7.3.tar.bz2
4) Go to the source dict and configure everything
cd vim73/src/
./configure --with-features=huge --enable-pythoninterp --with-python-config-dir=/usr/lib/python2.7/config
5)Move the executable to the /usr/bin:
mv ./vim /usr/bin
6) Check if it all works
vim -g 
:python print 'hello vim'
That's all folks!

Sunday, January 15, 2012

Basic Perl

Perl is a wonderful language.
It gives me a certain satisfaction to write obfuscated code.
When I say it is obfuscated, I compare it to Objective-C.
Perl looks neat and has a ton of great shortcuts.
Here is a nice little example that I took few hours to write.
It uses the LWP package to request pages from the Expasy molecular weight tool.

We have a hard-coded array of certain protein sequences for which we want to get the isoelectric point (pI) and the molecular weight (Mw).
Both are relatively simple calculations, but the purpose of this code snippet is to demonstrate internet interfacing with LWP. We can use the same principles to interface with Google, for example.
For each sequence in the list, we form a new request to the Expasy pI tool.
We then use a simple regular expression (since we know exactly how the output will look like) to get the numbers we are looking for.
After parsing the regular expression, we put the numbers in a hash and then add that hash to list.
The list is our result table.
When we are done, we just print out the result table.
#!/usr/bin/perl -w
use strict;
#This is the package we use for getting networked files by url
use LWP::Simple; 

#hard code the list of sequences
my @sequences = ('MKWVTFISLLFLFSSAYS', 'MWVTFISLL', 'MFISLLFLFSSAYS');
#initialize script-wide data array. We will store our result table here.
my @data;
#for each sequence
foreach (@sequences) {
    #form a new request using string concatenation
    #in this case, $_ gives us the current element (i.e. sequences[i])
    my $request = 'http://ca.expasy.org/cgi-bin/pi_tool?protein='.$_.'&#amp;resolution=monoisotopic';
    #try to fetch the page.
    #Isn't it nice to have the error handler in the same line?
    my $mw_page = get $request or die "Unable to fetch $request\n";
    #A few notes on the regex:
    #It starts with a simple text match. The digits are grouped in to brackets
    #The first group is the pI, accessed as $1
    #One or more digits, followed by a full stop, followed by one or two digits
    #The second group is the same, addressed s $2
    if ($mw_page =~ /Theoretical pI\/Mw: (\d+\.\d{1,2}) \/ (\d+\.\d{1,2})/) { 
        #create the hash
        #notice that $_ still holds the sequence
        my %entry = (sequence =&#gt; $_, pI =&#gt; $1, Mw =&#gt; $2);
        #use reference to the entry in the array
        push @data, \%entry;
    } else {
        print "No match found\n";
 }
}
#for each reference to a hash in data
foreach my $hash_ref (@data) {
    #get key,value pairs
    #notice the dereferencing %$
    while ((my $key, my $value) = each %$hash_ref) {
        print "$key: $value\n";
    }
}

Thursday, December 29, 2011

Common escape characters


\a

Beep

\b

Backspace

\c

"Control" character. \cD = CTRL-D

\e

Escape

\f

Form feed

\l

Make the next letter lowercase

\n

New line, return.

\r

Carriage return.

\t

Tab.

\u

Make the next letter uppercase

\x

Enables hex numbers

\v

Vertical tab

A simple problem in R

I have recently started looking at the R environment for statistical analysis.
After one day, I am fascinated and upset.
The fascination comes from the insanely large number of abilities you are given.
The frustration is from the same source.
Namely, there is no single way to do something.
Also, the naming sucks and it is hard to get around with no empirical basis.
Anyway, here is a sample problem that I happened to have to solve along the way.

We have a small sample survey of stress levels at work.
The sample size is 30 and the categories used for classification are "none", "somewhat" and "very".
Stressful, that is.
We need to figure out the frequency distribution of this data.
The input given is a comma separated file with the 30 values in one line.
Say it is called "stress.csv", located in the pwd.
Here is my example R session:

stress = read.table("stress.csv",header=FALSE)
stress = t(stress)
signif((table(stress)/length(stress)*100),3)

Notice that since we have the initial data in one line, we have to transpose the numbers before we start working with it.
We then use the table function to count occurrences of unique elements.
The last statement looks awkward, and may be cleaned up.
I do, however, like nesting things.
Here is a simplified version:

#Count unique elements
occurrences = table(stress)
#Count fractional frequency of each element                      
frequency = occurrences/length(stress)
#Convert to percent value           
frequency_percent = frequency * 100
#Format with three significant digits              
formatted_output = signif(frequency_percent, 3)  
Hope this clarifies things a bit.
Printing formatted_output gives:
stress
    none somewhat     very 
    20.0     46.7     33.3 
Quick and dirty, but seems to do the job.

Wednesday, November 16, 2011

A morning of building

I am a university student.
And my computer department's software policy is terrible.
For one, their VIM was not compiled with GNU support.
So you are stuck with a fairly awkward terminal VIM.
I mean, of course, VIM is supposed to be all keyboard-rocking and everything but...
...isn't is nice to sometimes just grab those code lines with a mouse select?
Anyways, I set out on a quest for getting gVIM working on my account.
Certainly, I do not have sudo permissions, so I had to get the source.
Ok, now what? I have never built anything outside of an IDE before.
make?, was it?
Alright,
cd vim73/src
make
#big, scary, make log here
./vim -g
Awesome!
But as soon as I try to turn on syntax highlighting, I notice that everything is not this great.
I do not have write access to the /usr/bin.
After a few minutes of searching I ended up doing this:
./configure --prefix=$HOME/vim/
make install
#bigger, scarier, make install log here
cd ~/vim/bin/vim -g
Essentially, you tell make install to install everything in the prefix directory.
$HOME is the home directory environmental variable of course.
To be cool, I have also edited by shell startup file (.bash_profile for bash).
alias vim='$HOME/vim/bin/vim'
alias gvim='$HOME/vim/bin/vim -g'
Let the magic begin!

Tuesday, November 15, 2011

A few words on syntax

I have wanted to get some really easy out-of-the-box setup to color the syntax in the code.
After all, most of the posts on this blog are about code.
Google provides a nice easy solution: google-code-prettify.
Here is a step by step:
1) Include the following files from the Google SVN repo:
<link href=
    "http://google-code-prettify.googlecode.com
    /svn/trunk/src/prettify.css" 
    type="text/css" rel="stylesheet" />
<script type="text/javascript" 
    src="http://google-code-prettify.googlecode.com
    /svn/trunk/src/prettify.js"></script>
2) Add an on-load to the body of the html:
onload="prettyPrint()"
3) Any time you write code, use the following tags:
<pre class="prettyprint lang-html">
    <!-- Hey, look, this is a nice little comment. 
    I should have more of these in my code! -->
</pre>
Note that the new dynamic layouts do not allow you to modify HTML directly, so I ended up going with the old layouts.
Also, this tool by Dan Cederholm has helped me enormously.

Java iterator for a binary search tree

Java has a commonly used Iterator interface.
It is no-where as cool as implicit python iterators, but it may be useful every now and then.
It is usually used like this:
Iterator e = container.iterator();  
while (e.hasNext())  
{  
    System.out.println(e.next());
}  
How do we implement a binary search tree iterator?
Look at the code here for binary search tree iterative traversal.
A simplest solution would be to "print" a tree into a storage array and then allow the iterator to operate on this array.
Alternatively (as shown here), we can use the iterative method developed earlier to maintain iterator state.
A basic idea for this iterator is saving a pointer (I call it a cursor) to a most recently found element.
Since the iterator has to access a lot of the tree's internals, I like to declare it as a private class inside my binary search tree implementation.
Such an approach ensures that variables such as tree.root are not seen from the outside.
Here is a simple implementation of an iterator that allows in-order traversal:
private class BSTIterator implements Iterator
{
    Node root, cursor;
    Stack  iteratorStack;

    public BSTIterator (BSTNode root)
    {
        this.root = root;
        this.cursor = root;
        this.iteratorStack = new Stack ();
    }

    public boolean hasNext()
    {
        return (!iteratorStack.empty() || cursor != null);
    }

    public Comparable next()
    {
        Comparable nextNodeValue;
        while (cursor != null)
        {
            iteratorStack.push(cursor);
            cursor = cursor.leftChild;
        }
        cursor = iteratorStack.pop();
        nextNodeValue = cursor.key;
        cursor = cursor.rightChild;
        return nextNodeValue;
    }
}
The things to notice here are that we just broke up the iterative method into separate chunks inside this class.
All of our local variables moved to class globals.
The while condition became the hasNext() boolean.
The body of the iterative method was modified for next().
As a result, we get nice, clean, multipurpose code.