One of the main strengths of R is for its strong graphic capabilities, allowing fo the creation of easily-customised and complex plots. We begin by reviewing the R functions for producing the standard plots: scattter plots, histograms, boxplots, and quantile plots.
The plot
function produces a scatterplot of its two
arguments. For illustration, let us use the mtcars
data set
which contains information on the characteristics of 23 cars. We can
plot miles per gallon against weight with the command
data(mtcars)
plot(x=mtcars$wt, y=mtcars$mpg)
If the argument labels x
and y
are not
supplied, R will assume the first argument is x
and the second is y
. If only one vector of data is
supplied, this will be taken as the \(y\) value and will be plotted against the
integers 1:length(y)
, i.e. in the sequence in which they
appear in the data.
We can add a plot title and axis labels by supplying optional arguments:
main
- provides a title to display at the top of the
plotxlab
- provides a label for the horizontal axisylab
- provides a label for the vertical axisplot(x=mtcars$wt, y=mtcars$mpg, xlab="Weight", ylab="MPG", main="MPG vs Weight")
Another useful optional argument is type
, which can
substantially change how plot
draws the data. The
type
argument can take a number of different values to
produce different types of scatterplot
type="p"
- draws a standard scatterplot with a point
for every \((x,y)\) pairtype="l"
- connects adjacent \((x,y)\) pairs with straight lines, does not
draw pointstype="b"
- draws both points and connecting line
segmentstype="s"
- connects points with ‘steps’ rather than
straight linesThere are many other ways of customising the plot to use different
colours, point types, etc. This is achieved by supplying additional
optional arguments to plot
, and these are described in the
Customising plots section below.
R Help: plot
A histogram
consists of parallel vertical bars that graphically shows the frequency
distribution of a quantitative variable. The area of each bar is
proportional to the frequency of items found in each class. To plot a
histogram, we use the hist
function
hist(mtcars$mpg)
As with plot
, we can use main
and
xlab
to set the plot title and horizontal axis label.
Histogram also takes a number of arguments specific to the plotting of histograms:
breaks
- allows us to control the number of bars in the
histogram. If breaks
is set to a single number, this will
be used to (suggest) the number of bars in the histogram. If
breaks
is set to a vector, the values will be used to
indicate the endpoints of the bars of the histogram.freq
- if TRUE
the histogram shows the
simple frequencies or counts within each bar; if FALSE
then
the histogram shows probability densities rather than counts.R Help: hist
A boxplot provides a graphical view of the median, quartiles,
maximum, and minimum of a data set. Boxplots can be created for single
variables, or for all variables in a data frame. To draw a boxplot of a
single variable, multiple variables, or all variables ina data frame, we
simply pass the data directly to the boxplot
function:
par(mfrow=c(1,3))
boxplot(mtcars$mpg,ylab='Miles per gallon')
boxplot(mtcars$mpg, mtcars$cyl, ylab='MPG and Num. Cylinders')
boxplot(mtcars,main="All car milage variables")
A special usage of boxplot
is to take a single variable,
split that variable up into groups, and draw boxplots of the different
groups. This can be useful when the grouping is an important discrete or
categorical variable. For example, to show boxplots of miles-per-gallon
(mpg
) split by the number of engine cylinders
(cyl
) when both variables are defined in the same data
frame (in this case mtcars
) we would do the following:
boxplot(mpg~cyl, data=mtcars, main="Car Milage Data",
xlab="Number of Cylinders", ylab="Miles Per Gallon")
Optional arguments for boxplot
include:
horizontal
- if TRUE
the boxplots are
drawn horizontally rather than vertically.varwidth
- if TRUE
the boxplot widths are
drawn proportional to the square root of the samples sizes, so wider
boxplots represent more data.R Help: boxplot
Histograms leave much to the interpretation of the viewer. A better
graphical way in R to tell whether the data is distributed
normally is to look at a so-called quantile-quantile (QQ) plot. With
this technique, we plot the quantiles of the data (i.e. the ordered data
values) against the quantiles of a normal distribution. If the data are
normally distributed, then the points of the QQ plot will lie on a
straight line. Deviations from a straight line suggest departures from
the normal distribution. This technique can be applied to any
distribution, though R supports Normal quantile plots with the
qqnorm
function:
qqnorm(mtcars$mpg)
R makes it easy to combine multiple plots into one overall
graph, using either the par
or layout
functions.
With the par
function, we specify the argument
mfrow=c(nr, nc)
to split the plot window into a grid of nr
x nc plots that are filled in by row. For example, to divide the plot
window into a 2x2 grid we call par(mfrow=c(2,2))
as
below
Similarly, for 3 plots in a single column
To return to the usual single-plot display, we must call
par(mfrow=c(1,1))
.
When we don’t want to arrange plots in a simple regular grid, we can
use the layout
function. See the R help for more
details.