Alpha diversity measures (skbio.diversity.alpha)

This package provides implementations of various alpha diversity measures, including measures of richness, dominance, and evenness. Some functions generate confidence intervals (CIs). These functions have the suffix _ci.

All alpha diversity measures accept a vector of counts within a single sample, where each count is, for example, the number of observations of a particular Operational Taxonomic Unit, or OTU. We use the term “OTU” here very loosely, as these could be counts of any type of feature/observation (e.g., bacterial species). We’ll refer to this vector as the counts vector or simply counts throughout the documentation.

The counts vector must be one-dimensional and contain integers representing the number of individuals seen (or counted) for a particular OTU. Negative values are not allowed; the counts vector may only contain integers greater than or equal to zero.

The counts vector is array_like: anything that can be converted into a 1-D numpy array is acceptable input. For example, you can provide a numpy array or a native Python list and the results should be identical.

If the input to an alpha diversity measure does not meet the above requirements, the function will raise either a ValueError or a TypeError, depending on the condition that is violated.

Note

There are different ways that samples are represented in the ecological literature and in related software. The alpha diversity measures provided here always assume that the input contains abundance data: each count represents the number of individuals seen for a particular OTU in the sample. For example, if you have two OTUs, where 3 individuals were observed from one of the OTUs and only a single individual was observed from the other, you could represent this data in the following forms (among others):

As a vector of counts. This is the expected type of input for the alpha diversity measures in this module. There are 3 individuals from the OTU at index 0, and 1 individual from the OTU at index 1:

>>> counts = [3, 1]

As a vector of indices. The OTU at index 0 is observed 3 times, while the OTU at index 1 is observed 1 time:

>>> indices = [0, 0, 0, 1]

As a vector of frequencies. We have 1 OTU that is a singleton and 1 OTU that is a tripleton. We do not have any 0-tons or doubletons:

>>> frequencies = [0, 1, 0, 1]

Always use the first representation (a counts vector) with this module.

Functions

ace(counts[, rare_threshold]) Calculate the ACE metric (Abundance-based Coverage Estimator).
berger_parker_d(counts) Calculate Berger-Parker dominance.
brillouin_d(counts) Calculate Brillouin index of alpha diversity, which is defined as:
chao1(counts[, bias_corrected]) Calculate chao1 richness estimator.
chao1_ci(counts[, bias_corrected, zscore]) Calculate chao1 confidence interval.
dominance(counts) Calculate dominance.
doubles(counts) Calculate number of double occurrences (doubletons).
enspie(counts) Calculate ENS_pie alpha diversity measure.
equitability(counts[, base]) Calculate equitability (Shannon index corrected for number of OTUs).
esty_ci(counts) Calculate Esty’s CI.
fisher_alpha(counts) Calculate Fisher’s alpha.
gini_index(data[, method]) Calculate the Gini index.
goods_coverage(counts) Calculate Good’s coverage of counts.
heip_e(counts) Calculate Heip’s evenness measure.
kempton_taylor_q(counts[, lower_quantile, ...]) Calculate Kempton-Taylor Q index of alpha diversity.
lladser_ci(counts, r[, alpha, f, ci_type]) Calculate single CI of the conditional uncovered probability.
lladser_pe(counts[, r]) Calculate single point estimate of conditional uncovered probability.
margalef(counts) Calculate Margalef’s richness index, which is defined as:
mcintosh_d(counts) Calculate McIntosh dominance index D, which is defined as:
mcintosh_e(counts) Calculate McIntosh’s evenness measure E.
menhinick(counts) Calculate Menhinick’s richness index.
michaelis_menten_fit(counts[, num_repeats, ...]) Calculate Michaelis-Menten fit to rarefaction curve of observed OTUs.
observed_otus(counts) Calculate the number of distinct OTUs.
osd(counts) Calculate observed OTUs, singles, and doubles.
robbins(counts) Calculate Robbins’ estimator for the probability of unobserved outcomes.
shannon(counts[, base]) Calculate Shannon entropy of counts (H), default in bits.
simpson(counts) Calculate Simpson’s index.
simpson_e(counts) Calculate Simpson’s evenness measure E.
singles(counts) Calculate number of single occurrences (singletons).
strong(counts) Calculate Strong’s dominance index (Dw).

Examples

>>> import numpy as np

Assume we have the following abundance data for a sample, represented as a counts vector:

>>> counts = [1, 0, 0, 4, 1, 2, 3, 0]

We can count the number of OTUs:

>>> observed_otus(counts)
5

Note that OTUs with counts of zero are ignored.

In the previous example, we provided a Python list as input. We can also provide other types of input that are array_like:

>>> observed_otus((1, 0, 0, 4, 1, 2, 3, 0)) # tuple
5
>>> observed_otus(np.array([1, 0, 0, 4, 1, 2, 3, 0])) # numpy array
5

All of the alpha diversity measures work in this manner.

Other metrics include singles, which tells us how many OTUs are observed exactly one time (i.e., are singleton OTUs), and doubles, which tells us how many OTUs are observed exactly two times (i.e., are doubleton OTUs). Let’s see how many singletons and doubletons there are in the sample:

>>> singles(counts)
2
>>> doubles(counts)
1