R: base plotting without the wrappers

栏目: IT技术 · 发布时间: 4年前

内容简介：Base plotting is as old as R itself yet for most users it remains mysterious. They might be usingQuickly after learning R users start producing various figures via a call toLet’s start with the simplest example.

R Base Plotting Without Wrappers

Base plotting is as old as R itself yet for most users it remains mysterious. They might be using plot() in their work or even know the full list of its graphical parameters, such as cex or pch , but most never understand it fully. Many think of base plotting as an ad-hoc bag of tricks that has to be learned and remembered but that otherwise is hard, inconsistent, and unintuitive. Even experts who write about base graphicsor compare it with other systemsshare the same opinion. This article attempts to demystify base graphics by providing a friendly introduction for the uninitiated.

Deconstructing a Plot

Quickly after learning R users start producing various figures via a call to plot() , hist() , barplot() , or pie() . Then, when faced with a complicated figure, they start stacking those plots on top of one another using various hacks, like add=TRUE , ann=FALSE , cex=0 . For most this seems to mark the end of their base plotting journey and they quickly label it clunky and outdated. However, these functions they were using are only wrappers on top of the smaller functions that do all the work. And many would be surprised to learn that under the hood base plotting follows the paradigm of having a set of small functions that each do one thing and work well with one another.

Under the Hood

Let’s start with the simplest example.

plot(1:10, xlab = "x-axis", ylab = "y-axis", main = "my plot")

The plot() function above is really just a wrapper that calls an array of lower level functions.

plot.new()
plot.window(xlim = c(1,10), ylim = c(1,10))
points(1:10)
axis(1)
axis(2)
box()
title(xlab = "x-axis")
title(ylab = "y-axis")
title(main = "my plot")

Written like this all the elements comprising the plot become clear. Every new function call draws a single object on top of the plot produced up until that point. It becomes easy to see which line should be modified in order to change something on the plot. Just as an example let’s modify the above plot by: 1) adding a grid, 2) removing the box around the plot, 3) removing the axis lines, 4) making the annotation labels red, and 5) shifting the title to the left.

plot.new()
plot.window(xlim = c(0,10), ylim = c(0,10))
grid()
points(1:10)
axis(1, lwd = 0, font = 2)
axis(2, lwd = 0, font = 2)
title(xlab = "x-axis", col.lab = "darkred")
title(ylab = "y-axis", col.lab = "darkred")
title(main = "my plot", col.main = "darkred", adj = 0)

In each case to achieve the wanted effect only a single line had to be modified. And the function names are very intuitive. Someone who does not know anything about R would have no trouble saying which element on the plot is added by which line.

Functions

So, in order to construct a plot various functions are called one by one. But where do we get all the names for those functions? Do we need to remember hundreds of them? Turns out the set of all the things you might need to do on a plot is pretty limited.

par()          # specifies various plot parameters
plot.new()     # starts a new plot
plot.window()  # adds a coordinate system to the plot region

points()       # draws points
symbols()      # draws symbols
lines()        # draws lines connecting 2 points
abline()       # draws infinite lines throughout the plot
arrows()       # draws arrows
segments()     # draws segmented lines
rect()         # draws rectangles
polygon()      # draws complex polygons
contour()      # draws a contour

text()         # adds written text within the plot
mtext()        # adds text in the margins of a plot

title()        # adds plot and axis annotations
axis()         # adds axes
box()          # draws a box around a plot
grid()         # adds a grid over a coordinate system
legend()       # adds a legend

The above list covers majority of the functionality needed to recreate almost any plot. And for demonstration example() can be used to quickly see what each of those functions do, i.e. example(rect) . R also has some other helpful functions like rug and jitter to make certain situations easier but they are not crucial and can be implemented using the ones listed above.

Function Arguments

Function names are quite straightforward but what about their arguments? Indeed some of argument names, like cex can seem quite cryptic. But the argument name is always an abbreviation for a property of the plot. For example col is a shorthand for “color”, lwd stands for “line-width”, and cex means “character expansion”. Good news is that in general the same arguments stand for the same properties across all of base R functions. And for a specific function help() can always be used in order to get the list of all arguments and their descriptions.

To further illustrate the consistency between the arguments let’s return to the first example. By now it should be pretty clear, with one exception - the axis(1) and axis(2) lines. Where do those numbers: 1 and 2 came from? The numbers specify the positions around the plot and they start from 1 which refers to the bottom of the plot and go clockwise up to 4 which refers to the right side. The picture below demonstrates the relationship between numbers and four sides of the plot.

plot.new()
box()
mtext("1", side = 1)
mtext("2", side = 2)
mtext("3", side = 3)
mtext("4", side = 4)

The same position numbers are used throughout the different functions. Whenever a parameter of some function needs to specify a side, chances are it will do so using the described numeric notation. Below are a few examples.

par(mar = c(0,0,4,4))         # margins of a plot: c(bottom, left, right , top)
par(oma = c(1,1,1,1))         # outer margins of a plot
axis(3)                       # side where axis will be displayed
text(x, y5, "text", pos = 3)  # pos selects the side the "text" is displayed at
mtext("text", side = 4)       # side specifies the margin "text" will appear in

Vectorization

Another important point to understand is vectorization. Almost all the arguments for base plotting functions are vectorized. The user does not have to add each point or each rectangle one by one inside a loop. Instead he or she can draw all the related objects with one function call while at the same time specifying different positions and parameters for each. Here is a demonstration reconstructing a barplot using a single call to rect() :

x <- apply(USArrests[1:10,], 1, cumsum)

plot.new()
plot.window(xlim = c(0,11), ylim=c(0,500))

rect(col(x)-0.5, rbind(0, x[-4,]), col(x)+0.5, x, col = hcl.colors(4,"Fall"))
legend("top", rownames(x), fill = hcl.colors(4,"Fall"), horiz = TRUE, bty = 'n')

In this case for each rectangle four sets of points had to be specified: x and y for the left bottom corner plus x and y for the top right corner. We did so with the help of the col(x) function that returns the column number for each element in a matrix. In the end, even so this is a more complicated example, we still added all the rectangles using a single function call.

Constructing a Plot

One of base R graphics strengths is flexibility and customization. It really shines when a certain style from an existing plot or a template has to be followed. Here I will try to reproduce a plot shared on reddit.

We start with some random data, using state names instead of countries.

dat <- round(abs(matrix(rnorm(7*23), ncol = 7)), 3)
dat <- data.frame(sample(state.name, nrow(dat)), dat)
names(dat) <- c("state", "economy", "family", "health", "freedom", "government",
                "generosity", "dystopia")


head(dat)

##            state economy family health freedom government generosity dystopia
## 1         Kansas   0.662  0.279  0.894   0.170      1.029      1.906    0.158
## 2     Washington   1.719  0.709  0.613   0.826      1.101      1.001    1.437
## 3  Massachusetts   2.122  0.767  0.583   0.017      0.614      0.157    0.319
## 4       Illinois   1.497  1.443  0.006   0.492      0.346      0.879    1.234
## 5        Montana   0.036  0.845  1.865   0.355      0.086      0.017    0.758
## 6 North Carolina   1.232  0.399  1.830   0.113      0.649      0.545    1.335

The target plot is a barplot so at first it might seem like barplot() function is the best option. However, barplot() is a wrapper and by using it we would loose a lot of flexibility. Therefore in this case rect() will be used to recreate the barplot functionality in one line. Below is a simple implementation of the function. For clarity each line has a comment explaining what it does.

plothappiness <- function(df, col) {
  # add total happiness
  df <- data.frame(state=df$state, total=rowSums(df[,-1]), df[,-1])

  # order by decreasing happiness
  df <- df[order(-df$total),]

  # get cummulative sums for rectangle plotting
  s <- t(apply(df[,-c(1:2)], 1, cumsum))
  # add 0s, since rectangles with start at 0s
  s <- cbind(0, s)

  # turn of margins on all sides and set background color
  par(mar=c(0,0,0,0), bg="#F5F5F6")

  # start a new plot
  plot.new()

  # add coordinates
  #   x - from -3 to maximum happines,
  #   y - inverted, from the number of rows to -2 (for title)
  plot.window(xlim=c(-3, max(s)), ylim=c(nrow(s), -1))

  # add lines at every round number for grid
  abline(v=0:round(max(s)), col="lightgrey")

  # colors for every cell
  scol <- matrix(col, nrow=nrow(s), ncol=ncol(s)-1, byrow=TRUE)

  # draw rectangles
  # NOTE: this function is vectorized - I specify all rectables in one go
  # NOTE: I use row() to get y-coordinates for positions
  rect(s[,-ncol(s)], row(s)-0.25, s[,-1], row(s)+0.25, border=NA, col=scol)

  # add the title (-1 y will be on top because of y-axis is inverted)
  title <- paste(nrow(s), "HIGHEST HAPPINESS SCORES")
  text(-3.5, -1, title, font=2, cex=0.7, pos=4)

  # add the number labels to the grid
  text(0:round(max(s)), -1, 0:round(max(s)), col="lightgrey", cex=0.7, pos=4)

  # add the states
  states <- paste0(df$state, " (", round(df$total, 3), ")")
  text(-3, 1:nrow(df), states, pos=4, cex=0.7)

  # add numeric labels for each state
  text(-3.5, 1:nrow(df), paste0(1:nrow(df), "."), pos=4, cex=0.7)
}

Without all the comments the function is very short and simple.

plothappiness <- function(df, col) {
  df <- data.frame(state=df$state, total=rowSums(df[,-1]), df[,-1])
  df <- df[order(-df$total),]

  s <- t(apply(df[,-c(1:2)], 1, cumsum))
  s <- cbind(0, s)

  par(mar=c(0,0,0,0), bg="#F5F5F6")
  plot.new()
  plot.window(xlim=c(-3, max(s)), ylim=c(nrow(s), -1))
  abline(v=0:round(max(s)), col="lightgrey")

  scol <- matrix(col, nrow=nrow(s), ncol=ncol(s)-1, byrow=TRUE)
  rect(s[,-ncol(s)], row(s)-0.25, s[,-1], row(s)+0.25, border=NA, col=scol)

  title <- paste(nrow(s), "HIGHEST HAPPINESS SCORES")
  text(-3.5, -1, title, font=2, cex=0.7, pos=4)
  text(0:round(max(s)), -1, 0:round(max(s)), col="lightgrey", cex=0.7, pos=4)

  states <- paste0(df$state, " (", round(df$total, 3), ")")
  text(-3, 1:nrow(df), states, pos=4, cex=0.7)
  text(-3.5, 1:nrow(df), paste0(1:nrow(df), "."), pos=4, cex=0.7)
}

And now we try to replicate the colors and call the function.

colors <- c("#893086", "#E70A81", "#F0592F", "#D2DC1D", "#59C2CA", "#7880BD",
            "#B190C2")
plothappiness(dat, colors)

Let’s modify it some more. Say that in addition to what we currently have we also want to specify a happiness cutoff and dim all the states with happiness level below this threshold.

First - modify the function signature so it accepts a new argument for the cutoff.

plothappiness <- function(df, cutoff, col) {
...

Second - display all the state names below the cutoff in grey.

...
  cutcol <- ifelse(df$total < cutoff, "grey", "black")
  ...

Third - add transparency to all rectangles for states below the cutoff level to make them dimmer.

..
  scol[df$total < cutoff,] <- adjustcolor(scol[df$total < cutoff], 0.1)
  ..

With all that in place the final function looks like this.

plothappiness <- function(df, cutoff, col) {
  df <- data.frame(state=df$state, total=rowSums(df[,-1]), df[,-1])
  df <- df[order(-df$total),]

  s <- t(apply(df[,-c(1:2)], 1, cumsum))
  s <- cbind(0, s)

  par(mar=c(0,0,0,0), bg="#F5F5F6")
  plot.new()
  plot.window(xlim=c(-3, max(s)), ylim=c(nrow(s), -1))
  abline(v=0:round(max(s)), col="lightgrey")

  scol <- matrix(col, nrow=nrow(s), ncol=ncol(s)-1, byrow=TRUE)
  scol[df$total < cutoff,] <- adjustcolor(scol[df$total < cutoff], 0.1)
  rect(s[,-ncol(s)], row(s)-0.25, s[,-1], row(s)+0.25, border=NA, col=scol)

  title <- paste(nrow(s), "HIGHEST HAPPINESS SCORES")
  text(-3.5, -1, title, font=2, cex=0.7, pos=4)
  text(0:round(max(s)), -1, 0:round(max(s)), col="lightgrey", cex=0.7, pos=4)

  state  <- paste0(df$state, " (", round(df$total, 3), ")")
  cutcol <- ifelse(df$total < cutoff, "grey", "black")
  text(-3, 1:nrow(df), state, pos=4, cex=0.7, col=cutcol)
  text(-3.5, 1:nrow(df), paste0(1:nrow(df), "."), pos=4, cex=0.7, col=cutcol)
}

And now call the function with an additional argument for happiness.

colors <- c("#893086", "#E70A81", "#F0592F", "#D2DC1D", "#59C2CA", "#7880BD",
            "#B190C2")
plothappiness(dat, 5, colors)

Summary

Seems like most R users are never properly introduced to the real functions behind the base plotting paradigm. Instead they only familiarize themselves with various higher level wrappers that confuse and hide things. But when inspected properly base plotting can become friendly, simple, and intuitive.

以上所述就是小编给大家介绍的《R: base plotting without the wrappers》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

R: base plotting without the wrappers

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Computers and Intractability

M R Garey、D S Johnson / W. H. Freeman / 1979-4-26 / GBP 53.99

This book's introduction features a humorous story of a man with a line of people behind him, who explains to his boss, "I can't find an efficient algorithm, but neither can all these famous people." ......一起来看看《Computers and Intractability》这本书的介绍吧!

码农工具