Thursday, December 29, 2011

Common escape characters


\a

Beep

\b

Backspace

\c

"Control" character. \cD = CTRL-D

\e

Escape

\f

Form feed

\l

Make the next letter lowercase

\n

New line, return.

\r

Carriage return.

\t

Tab.

\u

Make the next letter uppercase

\x

Enables hex numbers

\v

Vertical tab

A simple problem in R

I have recently started looking at the R environment for statistical analysis.
After one day, I am fascinated and upset.
The fascination comes from the insanely large number of abilities you are given.
The frustration is from the same source.
Namely, there is no single way to do something.
Also, the naming sucks and it is hard to get around with no empirical basis.
Anyway, here is a sample problem that I happened to have to solve along the way.

We have a small sample survey of stress levels at work.
The sample size is 30 and the categories used for classification are "none", "somewhat" and "very".
Stressful, that is.
We need to figure out the frequency distribution of this data.
The input given is a comma separated file with the 30 values in one line.
Say it is called "stress.csv", located in the pwd.
Here is my example R session:

stress = read.table("stress.csv",header=FALSE)
stress = t(stress)
signif((table(stress)/length(stress)*100),3)

Notice that since we have the initial data in one line, we have to transpose the numbers before we start working with it.
We then use the table function to count occurrences of unique elements.
The last statement looks awkward, and may be cleaned up.
I do, however, like nesting things.
Here is a simplified version:

#Count unique elements
occurrences = table(stress)
#Count fractional frequency of each element                      
frequency = occurrences/length(stress)
#Convert to percent value           
frequency_percent = frequency * 100
#Format with three significant digits              
formatted_output = signif(frequency_percent, 3)  
Hope this clarifies things a bit.
Printing formatted_output gives:
stress
    none somewhat     very 
    20.0     46.7     33.3 
Quick and dirty, but seems to do the job.