Using count data with subsample() function of the divDyn package.

Although the subsample() function was developed for estimating turnover rates and diversity changes over multiple time intervals, it was adapted to execute subsampling on single samples.

Let’s say that we have a sample of 20 species, where species have the following number of specimens:

# counts of specimens
counts <- c(35,19,13,9,6,4,2,2,2,2,2,2,2,1,1,1,1,1,1,1)

Preparing data

The current version of divDyn can only accept extended formats for ecological samples. This means that every specimen (identity) has to be present as a separate object. To make the count data above compatible with subsample(), we can use the rep() funciton from the base package to repeat placeholders (i.e. letters) that symbolize the specimens that belong to different species.

# one letter that represents each species
species <- letters[1:length(counts)]

# extended format of the sample
specimens <- rep(species, counts)

Running the table() function confirms that the structure we created is indeed matching the count data above.

## specimens
##  a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s  t 
## 35 19 13  9  6  4  2  2  2  2  2  2  2  1  1  1  1  1  1  1

To make the vector of specimens compatible with the subsample() function, we have to make it a data.frame.

samp <- data.frame(specimens, stringsAsFactors=FALSE)

As factors are more difficult to work with (unless you intentionally want to), it is easier to write more reliable analyses if you stick with characters.

Richness function

By default, the subsample() function uses the divDyn() function as the applied procedure in the subsampling loops. We have to override this by another function that calculates diversity (e.g. richness) estimates from the long format of the sample. For richness, it is easiest to write your own function that just counts the number of unique elements in the vector.

richness <- function(x) length(unique(x))
## [1] 20


The important thing to note is, that the argument of this function has to be x. In simple cases where the data.frame which is fed to the subsample function has only one column, this represents a vector of specimens.

Now, all that remains is to feed our data to the subsampling. By default, subsample will execute classical rarefaction (Sanders, 1968).

# Richness subsampled to 10 individuals, number of iterations is 1000
subsample(samp, tax="specimens", q=10, FUN=richness, iter=1000, counter=F)
## [1] 5.931

As the target quota is quite low, you need to increase the number of iterations to make the estimate stabilize.

To run Shareholder Quorum Subsampling (Alroy, 2010), you have to specify the type and the q arguments appropriately.

# Richness subsampled to 10 individuals, number of iterations is 1000
subsample(samp, tax="specimens", q=0.5, type="sqs", FUN=richness, iter=1000, counter=F)
## [1] 3.493

You can contrast this with the analytical solution (coverage-based rarefaction, Chao and Jost, 2012) implemented in the iNEXT-package

# download with install.packages(iNEXT)

# estimate Hill numbers for 0.5 coverage
estimateD(counts, base="coverage", level=0.5)
##   m       method order    SC    qD qD.LCL qD.UCL
## 1 5 interpolated     0 0.498 3.745  3.487  4.004
## 2 5 interpolated     1 0.498 3.411  3.112  3.711
## 3 5 interpolated     2 0.498 3.048  2.699  3.397

Value we are interested in is qD at the order of 0, which is somewhat higher than the solution suggested by SQS, but the offset is really not high, concerning that we are talking about fractions of species. The true advantage of using subsample() is that you can supply any function as FUN and do not have to stick with Hill-numbers.


Thanks to Erin Saupe for asking about the example and indirectly to Seth Finnegan for his original course material.


Alroy, J. (2010). The Shifting Balance of Diversity Among Major Marine Animal Groups. Science, 329, 1191–1194. PBDB 0117.

Chao, A., & Jost, L. (2012). Coverage-based rarefaction and extrapolation: Standardizing samples by completeness rather than size. Ecology, 93(12), 2533–2547.

Sanders, H. L. (1968). Marine Benthic Diversity: A Comparative Study. The American Naturalist, 102(925), 243–282.

Ádám T. Kocsis

A postdoctoral research fellow, with interests in global scale biogeographical and ecological processes, programming, philosophy and music.