Quick Start Guide

Please see the introductory User Guide which presents a "hands on" introduction to bio3d and is intended to familiarize the user with a few essentials important for getting off to a quick start.


Basics

Bio3d is an R package containing utilities to process, organize and explore structure and sequence data. Features include the ability to read and write structure, sequence and dynamic trajectory data, perform atom selection, re-orientation, superposition, rigid core identification, clustering, distance matrix analysis, conservation analysis and principal component analysis. Bio3d takes advantage of the extensive graphical and statistical capabilities of the R environment (Ihaka and Gentleman, 1996) and thus represents a useful framework for exploratory analysis of structural data.

In the examples below we assume that you have successfully installed R and the Bio3D package, by following the installation instructions on the Download page.

Loading bio3d

Start R (type R at the comand prompt or, on Windows, double click on the R icon) and load the bio3d package by typing library(bio3d) at the R console prompt

library(bio3d)

library(bio3d)

Use the command lbio3d() to list the functions within the package

lbio3d()

lbio3d()

Note. in order to be executed, a function always needs to be written with parentheses, even if there is nothing within them (e.g., lbio3d()). If one just types the name of a function without parentheses, R will display the contents of the function (i.e. the code of the function).

Finding Help

To get help on a particular function try ?function or help(function). For example

?pca.xyz

help()

To search the help system for documentation matching a particular word or topic use the comand help.search("topic")

help.search("dcd")

help.search()

Typing help.start() will start a local HTML interface. After initiating 'help.start()' in a session the '?function' commands will open as HTML pages

To execute examples for a particular function use the comand example(function). For run examples for the read dcd function try:

example(read.dcd)

example()

Bailing out

R is generally very tolerent, and can be interrupted by Ctrl-C (i.e. hold down the key marked Control and hit the C key). This will interrupt the current operation and return to the R prompt.

To quit R, type q()

q()

q()

Function Overview

The bio3d package consists of input/output functions, conversion and manipulation functions, analysis functions, and graphics functions which are summarized below with links to further documentation.

IO

Read and Write Common Data Types

read.fasta Read aligned or un-aligned sequences from a file in FASTA format
read.pdb Read structure data from Protein Data Bank (PDB) files
read.fasta.pdb Read aligned PDB structures
read.dcd Read coordinate data from CHARMM/X-PLOR/NAMD binary DCD files
write.fasta Write aligned or un-aligned sequences to a FASTA format file
write.pdb Write a PDB format coordinate file
mktrj.pca Make a trjactory of atomic displacments along a given principal component

Analysis

Do Interesting Things

pca.xyz Performs principal components analysis (PCA) on a set of conformers
dm Construct a distance matrix for a given protein structure
rmsd Calculate the root mean square deviation (RMSD) between coordinate sets
core.find Identify the most invarient region in an aligned set of protein structures
entropy Calculate the sequence entropy score per position in an alignment
consensus Determines the consensus sequence for a given alignment at a given identity cutoff value

Graphics

Plotting and Graphic Display

plot.dmat Plot a distance or difference distance matrix as returned from the 'dm' function
plot.pca Produces a z-score plot (conformer plot) and an eigen spectrum plot (scree plot)
plot.pca.loadings
plot.pca.score
plot.pca.scree

Convert

Convert and Manipulate Data

aa321 Convert between three-letter PDB style aminoacid codes and one-letter IUPAC aminoacid codes
aa123 Convert between one-letter IUPAC aminoacid codes and three-letter PDB style aminoacid codes
orient.pdb
fit.xyz Coordinate superposition of multiple conformers
rot.lsq Coordinate superposition with the Kabsch algorithm

Utilities

Useful Odds and Ends

pdb.summary some text
atom.select some text
gap.inspect some text
pairwise some text
bwr.colors some text
mono.colors some text
bounds some text

Data Input

Reading Sequence data

Use the read.fasta() function
aln <- read.fasta("eg.fa")
The 'aln' object is a list with $id and $ali components try the cmd 'attributes(aln)'

Reading trajectory data

Start R and load the bio3d package as described above.
Then use the read.dcd() function to read your trajectory
Note. You can use VMD to write a dcd format trajectory from multi-model PDBs and other common trajectory formats.

trj<-read.dcd("prot.dcd")

This will create a trajectory matrix called 'trj' from the file 'prot.dcd' in the current directory.

To find the number of structures in your 'trj' data structure try:

nrow(trj)

and to find the number of atoms use

ncol(trj)/3

Note that each row corresponds to a frame from your trajectory and each coloum coresponds to a an atom coordinate. DCD files contain no atom type information but this can be convinently obtained from a coresponding PDB file.

Reading PDB data

Start R and load the bio3d package as described above.
Then use the read.pdb() function

pdb.1<-read.pdb("complex_cgbd_3000-5000.pdb")
pdb.2<-read.pdb("1HPV.pdb")

To see whats in 'pdb.1' type

attributes(pdb.1)

To see ATOM records

pdb.1$atom

To see coords as a numeric vector

pdb.1$xyz

How do I tidy up my PDB file

Read your PDB

pdb.1<-read.pdb("complex_cgbd_3000-5000.pdb")

Change residue names to upper case

pdb.1$atom[,"resid"]<-toupper(pdb.1$atom[,"resid"])

Write out a new PDB

write.pdb(pdb.1, file="complex_cgbd_3000-5000_tidy.pdb")

Or write with no chain id

write.pdb(pdb.1, file="complex_cgbd_3000-5000_tidy.pdb",chain=rep(" ",nrow(pdb.1$atom))

How to extract the sequence from my PDB

to get sequence from 'pdb.1' (rember to change case to Upper)

seq.1 <- aa321(toupper(pdb$atom[pdb$calpha,"resid"]))
seq.2 <- aa321(pdb.2$atom[pdb.2$calpha,"resid"])

Or take the sequence from the SEQRES records

seq.3 <- aa321(pdb$seqres)

To write a sequence file you need an 'laignment' style object which is nothing more than a list object with an "id" and "ali" components

write.fasta(list( id=c("complex_cgbd_3000-5000_tidy"),ali=seq.1 ), file="eg.fa")
write.fasta(list( id=c("1HPV"),ali=seq.2 ), file="eg.fa", append=TRUE)

Reading multiple PDBs

If they are related, and you want to keep track for their correspondances, use the function read.fasta.pdb() and input a sequence alignment of the PDBs

aln <- read.fasta("eg.fa")
pdbs<-read.fasta.pdb(aln)

Reading data from other sources

scan(), readLines(), read.fwf(), read.table()

Import of specific file lines with grep regular expressions

The following example demonstrates the retrieval of specific lines from an external file with a regular expression. First, an external file is created with the 'cat' function, all lines of this file are imported into a vector with 'readLines', the specific elements (lines) are then retieved with the 'grep' function, and the resulting lines are split into vector fields with 'strsplit'.
cat(month.name, file="zzz.txt", sep="\n") x <- readLines("zzz.txt") x <- x[c(grep("^J", as.character(x), perl = TRUE))] t(as.data.frame(strsplit(x,"u")))

Batch import and export of many files

In the following example the *.txt file names in the current directory are first assigned to a list (the '$' sign is used to anchor the match to the end of a string). Second, the files are imported one-by-one using a for loop where the original names are assigned to the generated data frames with the 'assign' function. Read ?read.table to understand arguments 'row.names=1' and 'comment.char = "A"'. Third, the data frames are exported using their names for file naming and appending '*.out'.
files <- list.files(pattern=".txt$") for(i in files) { x <- read.table(i, header=TRUE, comment.char = "A", sep="\t") assign(i, x) print(i) write.table(x, paste(i, c(".out"), sep=""), quote=FALSE, sep="\t", col.names = NA) }

Data output

writing PDBs

writing sequences

writing trajectorys

writiing matrics and tables

saving your work

Worked examples

Sequence conservation

Tidying PDBs

Fitting

Rigid core identification

RMSD

PCA

Distance Matrices

Writing your own functions

One of the main attractions of using the R environment is the ease with which users can write their own programs and functions. The R programming syntax is extremely easy to learn, even for users with no previous programming experience. Once the basic R programming control structures are understood, users can use the language as a powerful calculation machine to perform complex analyses of almost any type of quantitative data.

simple example function

  • fit trj onto our PDB
Select some residue etc.
inds<-atom.select(pdb.1,string='////XK2///')
pdb.1$atominds$atom,
pdb.1$xyzinds$xyz

xyz<-fit.xyz(fixed=pdb.1$xyz-(inds$xyz), mobile=trj)

Fit on some residues

##- fit on selected positions 23 to 31 and 84 to 87 (see core.find)
inds.core <- atom.select(pdb.1,"///23:31,84:87///CA/")

xyz.c<-fit.xyz(fixed=pdb.1$xyz-(inds$xyz), mobile=trj, fixed.inds=inds.core$xyz, mobile.inds=inds.core$xyz)

##- Write out fitted trj
write.pdb(xyz=xyz.c, resid=pdb.1$atom[-inds$atom,"resid"], resno=pdb.1$atom[-inds$atom,"resno"], file="fit_core.pdb", verbose=TRUE)

#write.pdb(xyz=xyz.c, file="fit_core.pdb", verbose=TRUE)

r.trj<-rmsd(pdb.1$xyz-gaps,xyz.c)

How I do PCA on my trajectory

Read your trajectory and make sure all frames are fitted optimally

Then use the function pca.xyz() on the supperposed coordinates

How do I do PCA on X-ray structures

Read your aligned PDBs as described above

Fit

Exclude gap positions

Do PCA

poo
poo


Page Information

  • 1 year ago [history]
  • View page source
  • You're not logged in
  • Tags: toedit docs howto quick_start

Wiki Information

Recent PBwiki Blog Posts