
Educators: Earn a free Gold upgrade by joining the PBwiki Back To School Challenge.

Questions? Get live answers at PBwiki's weekly office hours (1 PM Eastern, Weds September 3)
R is a software environment for statistical computing and graphics available for download at http://www.r-project.org/
Amongst other things it includes:
R syntax is very simple and intuitive. For full details see the various manuals http://cran.r-project.org/manuals.html.
For a gental introduction see http://lib.stat.cmu.edu/R/CRAN/doc/contrib/Rdebuts_en.pdf
Emacs Speaks Statistics (or ESS) is an add-on package for emacs to allow easy editting R scripts. ESS provides a standard interface between a range of statistical programs and statistical processes. It is intended to provide assistance for interactive statistical programming and data analysis, and is based on and extends the capabilities of S-mode. The code is freely available but is not in the public domain. It is distributed under the GNU GPL from
To get your ".sf" function files rendered by ess edit the file ~/ess-5.2.5/lisp/ess-site.el
//cut
(if (assoc "\\.[rR]\\'" auto-mode-alist) nil
(setq auto-mode-alist
(append
'(("\\.sp\\'" . S-mode)
("\\.[qsS]\\'" . S-mode)
("\\.ssc\\'" . S-mode)
("\\.[rR]\\'" . R-mode)
("\\.sd\\'" . R-mode) ;; <- addation here :-)
//cut
To get help within R try typing help(functionname) or help.search("sometext") e.g.
| > help(plot) |
| > help.search("append") |
The fonction apropos() finds all functions which name contains the character string given as argument; only the packages loaded in memory are searched:
| > apropos("pdb") |
If this fails or proves insuficent try the web, particulary http://finzi.psych.upenn.edu/search.html
also see search engines listed here http://cran.r-project.org/search.html
There is an R cookbook of sorts here http://www.ku.edu/~pauljohn/R/Rtips.html which may prove usefull
There are lots of packages that contain functions to extend the capibilities of R. Most of these add-on packages can be found on CRAN (Comprehensive R Archive Network) which is a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R. From CRAN you can obtain gzipped tar files named pkg_version.tar.gz, which may in fact be “bundles” containing more than one package.
Provided that tar and gzip are available on your system, to install type
| $ R CMD INSTALL /path/to/pkg_version.tar.gz |
to install to the library tree rooted at the first directory given in R_LIBS (see below) if this is set and non-null, and to the default library (the library subdirectory of R_HOME) otherwise.
To find the full path to an instaled package (e.g. bio3d) run the following R code :
| .find.package("bio3d") |
To list the contents of a package try this:
| ls("package:packagename") |
substituting packagename for, you guesed it, the package name.
To remove add-on packages
| $ R CMD REMOVE packagename |
---
In R, in order to be executed, a function always needs to be written with parentheses, even if there is nothing within them (e.g., ls()). If one just types the name of a function without parentheses, R will display the contents of the function (i.e. the code of the function).
To list the formal arguments of a function
| > args(plot.default) |
To inspect what a function is doing, try inserting a browser(). This will stop R where you put the browser and allow you to inspect objects created in the function.
The sub-interpreter can be exited by typing 'c'. Execution then resumes at the statement following the call to 'browser'. Typing 'n' causes the step-through-debugger, to start and it is possible to step through the remainder of the function one line at a time. Typing 'Q' quits back to the prompt.
missing can be used to test whether a value was specified as an argument to a function. Also see: "Testing for missing, infinite, indefinite or **numeric(o)* values" section below.
match.arg takes a character 'arg' and character vector of 'choices' and trys to match 'arg' against the candidate values allowing for partical matches. This is usefull when used in conjuction with a switch statment (see below).
R operates on named data structures. The simplest such structure is the numeric vector, which is a single entity consisting of an ordered collection of numbers.
To set up a vector named x, say, consisting of five numbers, namely 10.4, 5.6, 3.1, 6.4 and 21.7, use the R command
| > x <- c(10.4, 5.6, 3.1, 6.4, 21.7) |
Here we have used the function "c", concatenate, which joins items forming a vector
| c(4:6) |
| 1 4 5 6 |
Other functions that result in the formation of vectors include "seq", and "rep":
"seq" = sequence, range like specifer (also "4:6" would work)
| > seq(4,6) |
| 1 4 5 6 |
Advantage over "c" is if you want your range in jumps of 2
| > seq(4,10,2) |
| 1 4 6 8 10 |
Most often used for for plot axis
"rep" = replicate, generates repeated values
| > oops <- (7,9,13) |
| > rep(oops,3) |
| 1 7 9 13 7 9 13 7 9 13 |
| > rep(oops,1:3) |
| 1 7 9 9 13 13 13 |
| > rep(1:2,c(3,6)) |
| 1 1 1 1 2 2 2 2 2 2 |
"append" = insert elements into an existing vector
| > append(1:5, 0:1, after=3) |
| 1 1 2 3 0 1 4 5 |
"identical" test (safely!) whether two objects are identical (i.e. excatly equal)
"which" = which indices of a vector,matrix etc. are TRUE
| > which(identical(x,y)) |
> which(matrix,arr.ind=T) to get row and col inds
"sort" & "order" = sorts a vector
| > xcoord <- c(10,14,12,11,10) |
| > sort(xcoord) |
| 1 10 10 11 12 14 |
"order" = give the sorted order (i.e. the indices of the sorted vector)
use "order" to sort another vector
| > o <- order(xcoord) |
| > xcoordo |
| 1 10 10 11 12 14 |
| > ycoordo |
| 1 2 3 6 3 1 |
| > o <- order(xcoord,ycoord) |
| > do, |
| xcoord ycoord |
| 1 10 2 |
| 5 10 3 |
| 4 11 6 |
| 3 12 3 |
| 2 14 1 |
Logical vectors = NA, True or False
| > c(T,T,F,T) |
1 TRUE TRUE etc.
(see: Testing for missing, infinite, indefinite or **numeric(o)* values below)
"list" Lists = combine vectors into lists
"data.frame" data matrix = a list of vectors of same length
"lapply" & "sapply" = apply a function to each element of a structure
"identical" = test equality
This is the way to test exact equality in 'if' and 'while' statements, as well as in logical expressions that use '&&' or '||'. In all these applications you need to be assured of getting a single logical value.
Note. operators, such as '==' or '!=', return an object like the arguments. If you expected 'x' and 'y' to be of length 1, but it happened that one of them wasn't, you will _not_ get a single 'FALSE'. Similarly, if one of the arguments is 'NA', the result is also 'NA'.
"all.equal" = test equality allowing reasonable differences in numeric results
| x <- 1.0; y <- 0.99999999999 |
| (E <- all.equal(x,y)) |
"stopifnot(...)"
If any of the expressions in '...' are not 'all' 'TRUE', 'stop' is called, producing an error message
| > stopifnot(identical(fxy, c(1,2,3))) |
"dim" sets dimension attribute
"matrix" creates matrices
"apply" = apply a function to the rows or cols of a matrix
"rownames" & "colnames" asign names
"t" transpose
"cbind" & "rbind"
"which" with arr.ind=T to return the indices of a TRUE in a matrix
| which(mymatrix,arr.ind = T) |
"row" and "col" returns a matrix of integers indicating their row (or col) number in the matrix.
E.g. to extract the diagonal of a matrix
E.g. to create an identity 5-by-5 matrix
"diag" Extract or replace the diagonal of a matrix, or construct a diagonal matrix.
"upper.tri" and "lower.tri" Returns a matrix of logicals the same size of a given matrix with entries 'TRUE' in the lower or upper triangle.
To deal with NA elements in a vector "y" you could make use of "is.na"
| > y<-c(3,6,NA,24) |
| > y.ok <- !is.na(y) |
| > y[y.ok] |
| 3 6 24 |
Or try "na.omit"
| > na.omit(y) |
| 3 6 24 |
| attr(,"na.action") |
| 3 |
| attr(,"class") |
| "omit" |
Also "any" can be useful for checking with "is.na"
| > any(is.na(y)) |
| TRUE |
| Character vector = vector of text strings |
| > c("Barry","Ana") |
Infinite values (Inf, -Inf and NaN) can be tested with "is.finite", "is.infinite", "is.nan" or even "is.number"
numeric(0) values can be tested for by the following "is.numeric0" function
| is.numeric0 <- function(x){length(x)==0 & is.numeric(x)} |
As above for numeric(0)
| is.character0 <- function(x){length(x)==0 & is.character(x)} |
or wrapping them up together with this function to test if a value is missing, empty, or contains only NA or NULL values.
"plot" xy plot
el")
"text" add text label
"srt" rotate displayed text by a specified number of degrees
"abline" add cross hairs
"mtext" margin coords used for text
"par" fine control of line width, types, font etc.
"mfrow" & "mfcol" = divides figure into subfigures
"add=T" combine plots
"plot=F" dont plot yet
"axis.break" need "library(bbgraphics)"
y
The function layout() partitions the active graphic window in several parts where the graphs will be displayed successively. This functions has for its main argument a matrix with integer numbers indicating the numbers of the “sub-windows”. For example, to divide the device into four equal parts:
| > layout(matrix(1:4, 2, 2)) |
You can use a simple matrix to help visualize how the device is divided:
| > mat <- matrix(1:4, 2, 2) |
| > mat |
[,1] [,2]
[1,] 1 3
[2,] 2 4
| > layout(mat) |
To actually visualize the partitions use the function layout.show with the number of sub-windows as argument (here 4).
In this example, we will have:
| > layout.show(4) |
The following examples show some of the possibilities offered by layout().
> layout(matrix(1:6, 3, 2))
> layout.show(6)
> layout(matrix(1:6, 2, 3))
> layout.show(6)
> m <- matrix(c(1:3, 3), 2, 2)
> layout(m)
> layout.show(3)
By default, layout() partitions the device with regular heights and widths: this can be modified with the options widths and heights. These dimensions are given relatively12. Examples:
> m <- matrix(1:4, 2, 2)
> layout(m, widths=c(1, 3),
heights=c(3, 1))
> layout.show(4)
> m <- matrix(c(1,1,2,1),2,2)
> layout(m, widths=c(2, 1),
heights=c(1, 2))
> layout.show(2)
Finally, the numbers in the matrix can include zeros giving the possibility to make complex partitions.
> m <- matrix(0:3, 2, 2)
> layout(m, c(1, 3), c(1, 3))
> layout.show(3)
overview of some graphical functions in R.
| plot(x) | plot of the values of x (on the y-axis) ordered on the x-axis |
| plot(x, y) | bivariate plot of x (on the x-axis) and y (on the y-axis) |
| sunflowerplot(x,y) | points with similar coordinates are drawn as flowers which petal number represents the number of points |
| piechart(x) | circular pie-chart |
| boxplot(x) | “box-and-whiskers” plot |
| stripplot(x) | plot of the values of x on a line (an alternative to boxplot() for small sample sizes) |
| coplot(x, y,z) | bivariate plot of x and y for each value or interval of values of z |
| interaction.plot(f1, f2, y) | if f1 and f2 are factors, plots the means of y (on the y-axis) with respect to the values of f1 (on the x-axis) and of f2 (different curves); the option fun allows to choose the summary statistic of y (by default fun=mean) |
| matplot(x,y) | bivariate plot of the first column of x vs. the first one of y, the second one of x vs. the second one of y, etc. |
| dotplot(x) | if x is a data frame, plots a Cleveland dot plot (stacked plots line-by-line and columnby-column) |
| assocplot(x) | Cohen–Friendly graph showing the deviations from independence of rows and columns in a two dimensional contingency table |
| mosaicplot(x) | ‘mosaic’ graph of the residuals from a log-linear regression of a contingency table |
| pairs(x) | if x is a matrix or a data frame, draws all possible bivariate plots between the columns of x |
| plot.ts(x) | if x is an object of class "ts", plot of x with respect to time, x may be multivariate butthe series must have the same frequency and dates |
| ts.plot(x) | id. but if x is multivariate the series may have different dates and must have the same frequency |
| hist(x) | histogram of the frequencies of x |
| barplot(x) | histogram of the values of x |
| qqnorm(x) | quantiles of x with respect to the values expected under a normal law |
| qqplot(x, y) | quantiles of y with respect to the quantiles of x |
| contour(x, y, z) | contour plot (data are interpolated to draw the curves), x and y must be vectors and z must be a matrix |
| filled.contour(x, y, z) | id. but the areas between the contours are coloured, and a legend of the colours is drawn as well |
| image(x, y, z) | id. but with colours (actual data are plotted) |
| persp(x, y, z) id. but in perspective (actual data are plotted) | |
| stars(x) | if x is a matrix or a data frame, draws a graph with segments or a star where each row of x is represented by a star and the columns are the lengths of the segments |
| symbols(x, y,...) | draws, at the coordinates given by x and y, symbols (circles, squares, rectangles, stars, thermometres or “boxplots”) which sizes, colours . . . are specified by supplementary arguments |
There are several ways to draw the angstrom symbol on a plot
(i) Specify the character code in octal. Assuming ISO Latin 1 encoding,
something like ...
... or ...
... should work. That should be ok on default setups for Windows and
Unix. On the Mac it might have to be "\201" (untested) See, e.g.,
http://czyborra.com/charsets/iso8859.html#ISO-8859-1 (Unix)
http://www.microsoft.com/typography/unicode/1252.gif (Windows)
http://kodeks.uni-bamberg.de/Computer/CodePages/MacStd.htm (Mac)
for other standard "symbols".
(ii) Use a mathematical expression. This won't look as good because the
ring and the A are not a single coherent glyph, but it should work
"everywhere" ...
... or ...
... demo(plotmath) shows the range of things you can do with this approach.
(iii) Use a hershey font (again, should work on all platforms and
encodings) ...
... or ...
... demo(Hershey) shows the symbols available with this approach.
"while"
"repeat"
"ifelse" If you want to calculate something, but only if y is *between* 0 and 1 then
| ifelse(y==0,0,y*log(y)) |
| Operator | Description | Example |
| == | Equals | > value1 |
| 3 6 23 | ||
| > value1==23 | ||
| FALSE FALSE TRUE | ||
| < | Less Than | > value1 < 6 |
| TRUE FALSE FALSE | ||
| > | Greater Than | > value1 > 6 |
| FALSE FALSE TRUE | ||
| <= | Less Than or Equal To | > value1 <= 6 |
| TRUE TRUE FALSE | ||
| >= | Greater Than or Equal To | > value1 >= 6 |
| FALSE FALSE TRUE | ||
| & | Elementwise And | > value2 |
| 1 2 3 | ||
| > value1==6 & value2 <= 2 | ||
| FALSE TRUE FALSE | ||
| && | Control And | > value11 <- NA |
| > is.na(value1) && value2 == 1 | ||
| TRUE | ||
| xor | Elementwise Exclusive Or | > xor(is.na(value1), value2 == 2) |
| TRUE TRUE FALSE |
Or with a PIPE character (i.e. vertical dask)
Logical Negation with an exclamation mark (which I cant write in this wiki lingo)
"var", "cov" and "cor" = variance, covariance and correlation
"cov.wt"
"sd" = standard deviation
"nearest" = Return the index of a vector nearest in value to a supplied value
"all.equal" = Test if Two Objects are (Nearly) Equal
"cumsum" = Cumulative Sums, Products, and Extremes
"prod" = Product of Vector Elements
"range" = Range of Values
"round" = Rounding of Numbers
"tabulate" and "table" = Tabulation for Vectors (usefull for finding the mode etc.)
"runMedian" = Running Median or Mean
"identical" = test (safely!) whether two objects are excatly equal
"which" = which indices of a vector,matrix etc. are TRUE
%in%
1:10 %in% c(1,3,5,9) # get match indices
"%w/o%" <- function(x,y) x!x %in% y #-- x without y
| (1:10) %w/o% c(3,7,12) |
see also
union(x, y)
intersect(x, y)
setdiff(x, y)
setequal(x, y)
is.element(el, set)
match
This function solves a common problem - How to test for numeric(0)
| is.numeric0 <- function(x){length(x)==0 & is.numeric(x)} |
As above for numeric(0)
| is.character0 <- function(x){length(x)==0 & is.character(x)} |
or wrapping them up together with this function to test if a value is missing, empty, or contains only NA or NULL values.
See http://www.statsoft.com/textbook/stdisfit.html for background stuff.
Chi-square, Q-Q plots, P-P plots, various tests (Kolmogorov-Smirnov, Shapiro-Wilks' W)
For fitting distributions (via ML) look at ?fitdistr
in package MASS. Goodness-of-fit testing after the estimation of parameters might not be straightforward, but there are several solutions for testing normality in package nortest. Also look at ?shapiro.test and ?goodfit in package vcd.
All purpose gof tests are also available, see ?ks.test and ?chisq.test. Z
| s <- "the quick red fox jumps over the lazy brown dog"
| ss <- strsplit(s, " ")[1] |
| ss1 <- substring(ss, 1,1) |
| ss2 <- substring(ss, 2) |
| paste(toupper(ss1), ss2, sep="", collapse=" ") |
| 1 "The Quick Red Fox Jumps Over The Lazy Brown Dog" |
To actually search for a dot, you need to 'escape' it with a
backslash, but of course the backslash needs escaping itself, with
another backslash. Luckily that backslash doesn't need escaping,
otherwise we would quickly run out of patience.
| regexpr("\\.", "Female.Alabama") |
| > 7 |
| sub("[\."]," ", "some.bullshit.string") |
It's also worth remembering the use of [], normally used to enclose
a disjunctive list of characters to match (e.g. Aa matches either
"A" or "a") or a range (e.g. 0-9 matches any digit). Any metacharacter
occurring within will be interpreted literally with exceptions "\"
and (for obvious reasons) "]" which must be escaped (in which case
the use of [] is redundant); -- however, "[" works!
"assign" = paste together a varable name
| > assign(paste("file", 1, "max", sep=""), 1) |
| > ls() |
| 1 "file1max" |
I find the gclus package by Catherine Hurley useful when it comes displaying the results of cluster analysis.
Amongst other things it is capable of ordering panels in scatterplot matrices and parallel coordinate displays by some merit index.
The package contains various indices of merit, ordering functions, and enhanced versions of pairs and parcoord which color panels according to their merit level.
see: http://lib.stat.cmu.edu/R/CRAN/src/contrib/Descriptions/gclus.html
The function reorder.hclust is of particulary note
Use the dump function to write out data structures and sink to add any comments you want to the file.
For example:
| sink("backup.log",append=T) |
| print(paste("##### Printing out data structures for iteration:",i) |
| sink() |
| dump(ls(),"backup.log",append=T) |
Page Information
|
Wiki Information |
Recent PBwiki Blog Posts |