Why should I use R?
R is a powerful environment and programming language for the analysis of numerical data. While there are many other common applications that will allow you to manipulate lists of numbers (e.g., spreadsheet programs), R also allows for the easy calculation of a number of quantities, provides a powerful environment for performing numerical simulations and has fantistic graphics capabilities.
What R lacks in apparent user-friendliness, it more than makes up for in power. While there is certainly a learning curve associated with developing the skills you will need to perform analyses in R, this is really true of any software package that you will use. Once you acquire some of the basics, you will find that using R is logical and simple. A couple questions naturally arise: (1) What is R? (2) What are the pros and cons of using R? and (3) Why use it instead of, say, a spreadsheet application?
R uses the S language
- R is a "dialect" of the S statistical programming language
“S is a programming language and environment for all kinds of computing involving data.
It has a simple goal: to turn ideas into software, quickly and faithfully.”
John M. Chambers
Bell Laboratories (major contributor and developer of the S language)
- S-Plus: commercial implementation of the S language
- R: free software implementation of the S language (http://www.r-project.org)
- Developed by R. Gentleman and R. Ihaka (U of Auckland, NZ) during the 1990s
- Advanced statistical computing system, freely available for most computing platforms.
- Updated versions available every 3-4 months
Pros and cons of R
Pros include:
- Powerful, state-of-the-art
- Used by professional statisticians
- Lot of documentation
- Learn by example
- Easy to extend
– Modify and improve
– Create add-on packages
– Many already available
- Freely available
- Unix, Windows & Mac
- Extendable, with numerous add-on packages available.
- Programmable: if r can’t do a particular task, you can program R to do it.
- R produces publication quality graphics.
R has a remarkable online presence in the form of help lists, tutorials, etc. which will facilitate solving the problems you inevitably run into in the course of your research. R represents the state-of-the-art in statistical computing.
Cons include:
- Not very easy to learn (many details)
- Easy to forget
- Sometimes forced to learn by example
- Documentation sometimes cryptic
- Not very (easily) interactive in the Excel point and click sense
- Command-based
- Still evolving: backward-compatibility has been an issue
- Slow at times when compared to comerical packages like matlab
If you “just want to do basic statistical analysis” then its easy to find alternatives
If you intend to do exploratory data analysis such as microarray data analysis then its probably one of best options
Why Not a Spreadsheet?
- While a spreadsheet is handy for manually entering and viewing data along with guiding basic calculations, it is not ideal for more advanced problems.
- For example, calculating an eigenvalue or numerically solving ordinary differential equations. These are a simple task in an environment such as R or Matlab, but do not exist (to the best of my knowledge in most common spreadsheet applications.