内容简介:Base plotting is as old as R itself yet for most users it remains mysterious. They might be usingQuickly after learning R users start producing various figures via a call toLet’s start with the simplest example.
R Base Plotting Without Wrappers
Base plotting is as old as R itself yet for most users it remains mysterious. They might be using plot()
in their work or even know the full list of its graphical parameters, such as cex
or pch
, but most never understand it fully. Many think of base plotting as an ad-hoc bag of tricks that has to be learned and remembered but that otherwise is hard, inconsistent, and unintuitive. Even experts who write about base graphicsor compare it with other systemsshare the same opinion. This article attempts to demystify base graphics by providing a friendly introduction for the uninitiated.
Deconstructing a Plot
Quickly after learning R users start producing various figures via a call to plot()
, hist()
, barplot()
, or pie()
. Then, when faced with a complicated figure, they start stacking those plots on top of one another using various hacks, like add=TRUE
, ann=FALSE
, cex=0
. For most this seems to mark the end of their base plotting journey and they quickly label it clunky and outdated. However, these functions they were using are only wrappers on top of the smaller functions that do all the work. And many would be surprised to learn that under the hood base plotting follows the paradigm of having a set of small functions that each do one thing and work well with one another.
Under the Hood
Let’s start with the simplest example.
plot(1:10, xlab = "x-axis", ylab = "y-axis", main = "my plot")
The plot()
function above is really just a wrapper that calls an array of lower level functions.
plot.new() plot.window(xlim = c(1,10), ylim = c(1,10)) points(1:10) axis(1) axis(2) box() title(xlab = "x-axis") title(ylab = "y-axis") title(main = "my plot")
Written like this all the elements comprising the plot become clear. Every new function call draws a single object on top of the plot produced up until that point. It becomes easy to see which line should be modified in order to change something on the plot. Just as an example let’s modify the above plot by: 1) adding a grid, 2) removing the box around the plot, 3) removing the axis lines, 4) making the annotation labels red, and 5) shifting the title to the left.
plot.new() plot.window(xlim = c(0,10), ylim = c(0,10)) grid() points(1:10) axis(1, lwd = 0, font = 2) axis(2, lwd = 0, font = 2) title(xlab = "x-axis", col.lab = "darkred") title(ylab = "y-axis", col.lab = "darkred") title(main = "my plot", col.main = "darkred", adj = 0)
In each case to achieve the wanted effect only a single line had to be modified. And the function names are very intuitive. Someone who does not know anything about R would have no trouble saying which element on the plot is added by which line.
Functions
So, in order to construct a plot various functions are called one by one. But where do we get all the names for those functions? Do we need to remember hundreds of them? Turns out the set of all the things you might need to do on a plot is pretty limited.
par() # specifies various plot parameters plot.new() # starts a new plot plot.window() # adds a coordinate system to the plot region points() # draws points symbols() # draws symbols lines() # draws lines connecting 2 points abline() # draws infinite lines throughout the plot arrows() # draws arrows segments() # draws segmented lines rect() # draws rectangles polygon() # draws complex polygons contour() # draws a contour text() # adds written text within the plot mtext() # adds text in the margins of a plot title() # adds plot and axis annotations axis() # adds axes box() # draws a box around a plot grid() # adds a grid over a coordinate system legend() # adds a legend
The above list covers majority of the functionality needed to recreate almost any plot. And for demonstration example()
can be used to quickly see what each of those functions do, i.e. example(rect)
. R also has some other helpful functions like rug
and jitter
to make certain situations easier but they are not crucial and can be implemented using the ones listed above.
Function Arguments
Function names are quite straightforward but what about their arguments? Indeed some of argument names, like cex
can seem quite cryptic. But the argument name is always an abbreviation for a property of the plot. For example col
is a shorthand for “color”, lwd
stands for “line-width”, and cex
means “character expansion”. Good news is that in general the same arguments stand for the same properties across all of base R functions. And for a specific function help()
can always be used in order to get the list of all arguments and their descriptions.
To further illustrate the consistency between the arguments let’s return to the first example. By now it should be pretty clear, with one exception - the axis(1)
and axis(2)
lines. Where do those numbers: 1
and 2
came from? The numbers specify the positions around the plot and they start from 1
which refers to the bottom of the plot and go clockwise up to 4
which refers to the right side. The picture below demonstrates the relationship between numbers and four sides of the plot.
plot.new() box() mtext("1", side = 1) mtext("2", side = 2) mtext("3", side = 3) mtext("4", side = 4)
The same position numbers are used throughout the different functions. Whenever a parameter of some function needs to specify a side, chances are it will do so using the described numeric notation. Below are a few examples.
par(mar = c(0,0,4,4)) # margins of a plot: c(bottom, left, right , top) par(oma = c(1,1,1,1)) # outer margins of a plot axis(3) # side where axis will be displayed text(x, y5, "text", pos = 3) # pos selects the side the "text" is displayed at mtext("text", side = 4) # side specifies the margin "text" will appear in
Vectorization
Another important point to understand is vectorization. Almost all the arguments for base plotting functions are vectorized. The user does not have to add each point or each rectangle one by one inside a loop. Instead he or she can draw all the related objects with one function call while at the same time specifying different positions and parameters for each. Here is a demonstration reconstructing a barplot using a single call to rect()
:
x <- apply(USArrests[1:10,], 1, cumsum) plot.new() plot.window(xlim = c(0,11), ylim=c(0,500)) rect(col(x)-0.5, rbind(0, x[-4,]), col(x)+0.5, x, col = hcl.colors(4,"Fall")) legend("top", rownames(x), fill = hcl.colors(4,"Fall"), horiz = TRUE, bty = 'n')
In this case for each rectangle four sets of points had to be specified: x and y for the left bottom corner plus x and y for the top right corner. We did so with the help of the col(x)
function that returns the column number for each element in a matrix. In the end, even so this is a more complicated example, we still added all the rectangles using a single function call.
Constructing a Plot
One of base R graphics strengths is flexibility and customization. It really shines when a certain style from an existing plot or a template has to be followed. Here I will try to reproduce a plot shared on reddit.
We start with some random data, using state names instead of countries.
dat <- round(abs(matrix(rnorm(7*23), ncol = 7)), 3) dat <- data.frame(sample(state.name, nrow(dat)), dat) names(dat) <- c("state", "economy", "family", "health", "freedom", "government", "generosity", "dystopia") head(dat) ## state economy family health freedom government generosity dystopia ## 1 Kansas 0.662 0.279 0.894 0.170 1.029 1.906 0.158 ## 2 Washington 1.719 0.709 0.613 0.826 1.101 1.001 1.437 ## 3 Massachusetts 2.122 0.767 0.583 0.017 0.614 0.157 0.319 ## 4 Illinois 1.497 1.443 0.006 0.492 0.346 0.879 1.234 ## 5 Montana 0.036 0.845 1.865 0.355 0.086 0.017 0.758 ## 6 North Carolina 1.232 0.399 1.830 0.113 0.649 0.545 1.335
The target plot is a barplot so at first it might seem like barplot()
function is the best option. However, barplot()
is a wrapper and by using it we would loose a lot of flexibility. Therefore in this case rect()
will be used to recreate the barplot functionality in one line. Below is a simple implementation of the function. For clarity each line has a comment explaining what it does.
plothappiness <- function(df, col) { # add total happiness df <- data.frame(state=df$state, total=rowSums(df[,-1]), df[,-1]) # order by decreasing happiness df <- df[order(-df$total),] # get cummulative sums for rectangle plotting s <- t(apply(df[,-c(1:2)], 1, cumsum)) # add 0s, since rectangles with start at 0s s <- cbind(0, s) # turn of margins on all sides and set background color par(mar=c(0,0,0,0), bg="#F5F5F6") # start a new plot plot.new() # add coordinates # x - from -3 to maximum happines, # y - inverted, from the number of rows to -2 (for title) plot.window(xlim=c(-3, max(s)), ylim=c(nrow(s), -1)) # add lines at every round number for grid abline(v=0:round(max(s)), col="lightgrey") # colors for every cell scol <- matrix(col, nrow=nrow(s), ncol=ncol(s)-1, byrow=TRUE) # draw rectangles # NOTE: this function is vectorized - I specify all rectables in one go # NOTE: I use row() to get y-coordinates for positions rect(s[,-ncol(s)], row(s)-0.25, s[,-1], row(s)+0.25, border=NA, col=scol) # add the title (-1 y will be on top because of y-axis is inverted) title <- paste(nrow(s), "HIGHEST HAPPINESS SCORES") text(-3.5, -1, title, font=2, cex=0.7, pos=4) # add the number labels to the grid text(0:round(max(s)), -1, 0:round(max(s)), col="lightgrey", cex=0.7, pos=4) # add the states states <- paste0(df$state, " (", round(df$total, 3), ")") text(-3, 1:nrow(df), states, pos=4, cex=0.7) # add numeric labels for each state text(-3.5, 1:nrow(df), paste0(1:nrow(df), "."), pos=4, cex=0.7) }
Without all the comments the function is very short and simple.
plothappiness <- function(df, col) { df <- data.frame(state=df$state, total=rowSums(df[,-1]), df[,-1]) df <- df[order(-df$total),] s <- t(apply(df[,-c(1:2)], 1, cumsum)) s <- cbind(0, s) par(mar=c(0,0,0,0), bg="#F5F5F6") plot.new() plot.window(xlim=c(-3, max(s)), ylim=c(nrow(s), -1)) abline(v=0:round(max(s)), col="lightgrey") scol <- matrix(col, nrow=nrow(s), ncol=ncol(s)-1, byrow=TRUE) rect(s[,-ncol(s)], row(s)-0.25, s[,-1], row(s)+0.25, border=NA, col=scol) title <- paste(nrow(s), "HIGHEST HAPPINESS SCORES") text(-3.5, -1, title, font=2, cex=0.7, pos=4) text(0:round(max(s)), -1, 0:round(max(s)), col="lightgrey", cex=0.7, pos=4) states <- paste0(df$state, " (", round(df$total, 3), ")") text(-3, 1:nrow(df), states, pos=4, cex=0.7) text(-3.5, 1:nrow(df), paste0(1:nrow(df), "."), pos=4, cex=0.7) }
And now we try to replicate the colors and call the function.
colors <- c("#893086", "#E70A81", "#F0592F", "#D2DC1D", "#59C2CA", "#7880BD", "#B190C2") plothappiness(dat, colors)
Let’s modify it some more. Say that in addition to what we currently have we also want to specify a happiness cutoff and dim all the states with happiness level below this threshold.
First - modify the function signature so it accepts a new argument for the cutoff.
plothappiness <- function(df, cutoff, col) { ...
Second - display all the state names below the cutoff in grey.
... cutcol <- ifelse(df$total < cutoff, "grey", "black") ...
Third - add transparency to all rectangles for states below the cutoff level to make them dimmer.
.. scol[df$total < cutoff,] <- adjustcolor(scol[df$total < cutoff], 0.1) ..
With all that in place the final function looks like this.
plothappiness <- function(df, cutoff, col) { df <- data.frame(state=df$state, total=rowSums(df[,-1]), df[,-1]) df <- df[order(-df$total),] s <- t(apply(df[,-c(1:2)], 1, cumsum)) s <- cbind(0, s) par(mar=c(0,0,0,0), bg="#F5F5F6") plot.new() plot.window(xlim=c(-3, max(s)), ylim=c(nrow(s), -1)) abline(v=0:round(max(s)), col="lightgrey") scol <- matrix(col, nrow=nrow(s), ncol=ncol(s)-1, byrow=TRUE) scol[df$total < cutoff,] <- adjustcolor(scol[df$total < cutoff], 0.1) rect(s[,-ncol(s)], row(s)-0.25, s[,-1], row(s)+0.25, border=NA, col=scol) title <- paste(nrow(s), "HIGHEST HAPPINESS SCORES") text(-3.5, -1, title, font=2, cex=0.7, pos=4) text(0:round(max(s)), -1, 0:round(max(s)), col="lightgrey", cex=0.7, pos=4) state <- paste0(df$state, " (", round(df$total, 3), ")") cutcol <- ifelse(df$total < cutoff, "grey", "black") text(-3, 1:nrow(df), state, pos=4, cex=0.7, col=cutcol) text(-3.5, 1:nrow(df), paste0(1:nrow(df), "."), pos=4, cex=0.7, col=cutcol) }
And now call the function with an additional argument for happiness.
colors <- c("#893086", "#E70A81", "#F0592F", "#D2DC1D", "#59C2CA", "#7880BD", "#B190C2") plothappiness(dat, 5, colors)
Summary
Seems like most R users are never properly introduced to the real functions behind the base plotting paradigm. Instead they only familiarize themselves with various higher level wrappers that confuse and hide things. But when inspected properly base plotting can become friendly, simple, and intuitive.
以上所述就是小编给大家介绍的《R: base plotting without the wrappers》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。